[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null
[ https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-10495: --- Attachment: (was: HIVE-10495.1.patch) Hive index creation code throws NPE if index table is null -- Key: HIVE-10495 URL: https://issues.apache.org/jira/browse/HIVE-10495 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Bing Li Assignee: Bing Li The stack trace would be: Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102) at $Proxy9.add_index(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533955#comment-14533955 ] Jesus Camacho Rodriguez commented on HIVE-9069: --- [~mmokhtar], it seems the plan is right and the predicates are correctly pushed to the sources. Could you let me know which predicate should still be pushed down that is not? Thanks Simplify filter predicates for CBO -- Key: HIVE-9069 URL: https://issues.apache.org/jira/browse/HIVE-9069 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Jesus Camacho Rodriguez Fix For: 0.14.1 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. Looks like this is still an issue, some of the filters can be pushed down to the scan. {code} set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=32000 set hive.exec.reducers.bytes.per.reducer=1 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 - Map 1 (BROADCAST_EDGE) Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 - Reducer 6 (SIMPLE_EDGE) Reducer 8 - Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_page filterExpr: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 2696178 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: wp_web_page_sk (type: int)
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533971#comment-14533971 ] Mostafa Mokhtar commented on HIVE-9069: --- [~jcamachorodriguez] Check web_sales for instance, it has the following predicates, all of which can be pushed down to the scan as a PPD or filter. Same thing applies for customer_demographics cd1 and customer_address customer_demographics cd1 doesn't get any filter pushed down while customer_address get {code} ca_country = 'United States' {code} pushed. {code} and ( ( ws_sales_price between 100.00 and 150.00 ) or ( ws_sales_price between 50.00 and 100.00 ) or ( ws_sales_price between 150.00 and 200.00 ) ) and ( ( ws_net_profit between 100 and 200 ) or ( ws_net_profit between 150 and 300 ) or ( ws_net_profit between 50 and 250 ) ) {code} Simplify filter predicates for CBO -- Key: HIVE-9069 URL: https://issues.apache.org/jira/browse/HIVE-9069 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Jesus Camacho Rodriguez Fix For: 0.14.1 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. Looks like this is still an issue, some of the filters can be pushed down to the scan. {code} set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=32000 set hive.exec.reducers.bytes.per.reducer=1 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 - Map 1 (BROADCAST_EDGE) Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 - Reducer 6 (SIMPLE_EDGE) Reducer 8 - Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1 Map Operator Tree:
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534035#comment-14534035 ] Lefty Leverenz commented on HIVE-9736: -- Doc note: This adds configuration parameter *hive.authprovider.hdfs.liststatus.batch.size* to HiveConf.java, so it needs to be documented in the wiki (for whatever release it ends up in). * [Configuration Properties -- Authentication/Authorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Authentication/Authorization] StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9456) Make Hive support unicode with MSSQL as Metastore backend
[ https://issues.apache.org/jira/browse/HIVE-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534092#comment-14534092 ] Lefty Leverenz commented on HIVE-9456: -- Does this need documentation? Make Hive support unicode with MSSQL as Metastore backend - Key: HIVE-9456 URL: https://issues.apache.org/jira/browse/HIVE-9456 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 1.2.0 Attachments: HIVE-9456.1.patch, HIVE-9456.2.patch, HIVE-9456.3.patch, HIVE-9456.branch-1.2.patch There are significant issues when Hive uses MSSQL as metastore backend to support unicode, since MSSQL handles varchar and nvarchar datatypes differently. Hive 0.14 metastore mssql script DDL was using varchar as datatype, which can't handle multi-bytes/unicode characters, e.g., Chinese chars. This JIRA is going to track implementation of unicode support in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10588) implement hashCode method for HWISessionItem
[ https://issues.apache.org/jira/browse/HIVE-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534187#comment-14534187 ] Hive QA commented on HIVE-10588: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731033/HIVE-10588.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8919 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3808/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3808/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3808/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731033 - PreCommit-HIVE-TRUNK-Build implement hashCode method for HWISessionItem Key: HIVE-10588 URL: https://issues.apache.org/jira/browse/HIVE-10588 Project: Hive Issue Type: Improvement Components: Web UI Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10588.1.patch, rb33796.patch HWISessionItem overwrites equals method but not hashCode method. It violates java contract below: If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. Currently equals and compareTo methods use sessionName in their implementation. sessionName.hashcode() can be used in HWISessionItem.hashCode as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9736: - Labels: TODOC1.2 (was: ) StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10621) serde typeinfo equals methods are not symmetric
[ https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534003#comment-14534003 ] Alexander Pivovarov commented on HIVE-10621: testCliDriver_encryption_insert_partition_static failed in many recent builds. So, tests look good serde typeinfo equals methods are not symmetric --- Key: HIVE-10621 URL: https://issues.apache.org/jira/browse/HIVE-10621 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10621.1.patch, rb33880.patch correct equals method implementation should start with {code} if (this == other) { return true; } if (other == null || getClass() != other.getClass()) { return false; } {code} DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, HiveDecimalWritable equals method implementation starts with {code} if (other == null || !(other instanceof class_name)) { return false } {code} - first of all check for null is redundant - the second issue is that other instanceof class_name check is not symmetric. contract of equals() implies that, a.equals(b) is true if and only if b.equals(a) is true Current implementation violates this contract. e.g. DecimalTypeInfo instanceof PrimitiveTypeInfo is true but PrimitiveTypeInfo instanceof DecimalTypeInfo is false See more details here http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534098#comment-14534098 ] Jesus Camacho Rodriguez commented on HIVE-9069: --- The case for the condition {{ca_country='United States'}} is a bit different, since it can get effectively pushed down. What you are proposing for {{ws_sales_price}} would actually be a reduction, right? Pushing the predicates down, but still leaving the filter with the condition on top. For instance, you could push to the scan the condition {noformat} ws_sales_price between 50.00 and 200.00 {noformat} but you still need to leave the other conditions in the tree in order to keep the correct semantics e.g. {noformat} ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ... {noformat} Simplify filter predicates for CBO -- Key: HIVE-9069 URL: https://issues.apache.org/jira/browse/HIVE-9069 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Jesus Camacho Rodriguez Fix For: 0.14.1 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. Looks like this is still an issue, some of the filters can be pushed down to the scan. {code} set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=32000 set hive.exec.reducers.bytes.per.reducer=1 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 - Map 1 (BROADCAST_EDGE) Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 - Reducer 6 (SIMPLE_EDGE) Reducer 8 - Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1
[jira] [Updated] (HIVE-10627) Queries fail with Failed to breakup Windowing invocations into Groups
[ https://issues.apache.org/jira/browse/HIVE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10627: --- Attachment: HIVE-10627.02.patch New patch addresses comments by [~jpullokkaran] Queries fail with Failed to breakup Windowing invocations into Groups - Key: HIVE-10627 URL: https://issues.apache.org/jira/browse/HIVE-10627 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-10627.01.patch, HIVE-10627.01.patch, HIVE-10627.02.patch, HIVE-10627.patch TPC-DS queries 51 fails with Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. {code} explain WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(ws_sales_price), sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative store_cumulative order by item_sk ,d_date limit 100; {code} Exception {code} org.apache.hadoop.hive.ql.parse.SemanticException: Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column reference '$f2' at org.apache.hadoop.hive.ql.parse.WindowingComponentizer.next(WindowingComponentizer.java:94) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:11538) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8514) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8472) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9304) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9592) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
[jira] [Commented] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =
[ https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534352#comment-14534352 ] Hive QA commented on HIVE-10628: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731058/HIVE-10628.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3809/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3809/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3809/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731058 - PreCommit-HIVE-TRUNK-Build Incorrect result when vectorized native mapjoin is enabled using null safe operators = Key: HIVE-10628 URL: https://issues.apache.org/jira/browse/HIVE-10628 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10628.01.patch Incorrect results for this query: {noformat} select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid 1000; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.12.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10255) Parquet PPD support TIMESTAMP
[ https://issues.apache.org/jira/browse/HIVE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534711#comment-14534711 ] Sergio Peña commented on HIVE-10255: Is there a way to detect what data type is the Parquet file using for Timestamp (int96 or timestamp_millis), and use a specific leaf filter for this? I am just thinking if we can support older versions of Parquet. Parquet PPD support TIMESTAMP - Key: HIVE-10255 URL: https://issues.apache.org/jira/browse/HIVE-10255 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-10255-parquet.1.patch, HIVE-10255-parquet.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10253) Parquet PPD support DATE
[ https://issues.apache.org/jira/browse/HIVE-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534719#comment-14534719 ] Sergio Peña commented on HIVE-10253: +1 Parquet PPD support DATE Key: HIVE-10253 URL: https://issues.apache.org/jira/browse/HIVE-10253 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-10253-parquet.patch, HIVE-10253.patch Hive should handle the DATE data type when generating and pushing the predicate to Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10624) Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env
[ https://issues.apache.org/jira/browse/HIVE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534647#comment-14534647 ] Hive QA commented on HIVE-10624: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731079/HIVE-10624.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8919 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3811/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3811/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3811/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731079 - PreCommit-HIVE-TRUNK-Build Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env - Key: HIVE-10624 URL: https://issues.apache.org/jira/browse/HIVE-10624 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10624.patch As discussed in the dev-list, we should update the script to make new beeline bucked cli default and allow user to change to old cli by environment variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10256) Filter row groups based on the block statistics in Parquet
[ https://issues.apache.org/jira/browse/HIVE-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534715#comment-14534715 ] Sergio Peña commented on HIVE-10256: Is this method name correct {{recordReader.getFiltedBlocks()}} ? Isn't getFilteredBlocks? Filter row groups based on the block statistics in Parquet -- Key: HIVE-10256 URL: https://issues.apache.org/jira/browse/HIVE-10256 Project: Hive Issue Type: Sub-task Reporter: Dong Chen Assignee: Dong Chen Attachments: HIVE-10256-parquet.patch In Parquet PPD, the not matched row groups should be eliminated. See {{TestOrcSplitElimination}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10639) create SHA1 UDF
[ https://issues.apache.org/jira/browse/HIVE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10639: --- Attachment: HIVE-10639.2.patch patch #2 - removed copyBytes operation which should improve performance create SHA1 UDF --- Key: HIVE-10639 URL: https://issues.apache.org/jira/browse/HIVE-10639 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10639.1.patch, HIVE-10639.2.patch Calculates an SHA-1 160-bit checksum for the string and binary, as described in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 hex digits, or NULL if the argument was NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8696) HCatClientHMSImpl doesn't use a Retrying-HiveMetastoreClient.
[ https://issues.apache.org/jira/browse/HIVE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534863#comment-14534863 ] Thiruvel Thirumoolan commented on HIVE-8696: Thanks Sushanth! HCatClientHMSImpl doesn't use a Retrying-HiveMetastoreClient. - Key: HIVE-8696 URL: https://issues.apache.org/jira/browse/HIVE-8696 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Thiruvel Thirumoolan Fix For: 1.2.0 Attachments: HIVE-8696.1.patch, HIVE-8696.2.patch, HIVE-8696.3.patch, HIVE-8696.4.patch, HIVE-8696.5.patch, HIVE-8696.poc.patch The HCatClientHMSImpl doesn't use a RetryingHiveMetastoreClient. Users of the HCatClient API that log in through keytabs will fail without retry, when their TGTs expire. The fix is inbound. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534736#comment-14534736 ] Mostafa Mokhtar commented on HIVE-9069: --- [~jcamachorodriguez] Yes, this is correct. Simplify filter predicates for CBO -- Key: HIVE-9069 URL: https://issues.apache.org/jira/browse/HIVE-9069 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Jesus Camacho Rodriguez Fix For: 0.14.1 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. Looks like this is still an issue, some of the filters can be pushed down to the scan. {code} set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=32000 set hive.exec.reducers.bytes.per.reducer=1 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 - Map 1 (BROADCAST_EDGE) Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 - Reducer 6 (SIMPLE_EDGE) Reducer 8 - Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_page filterExpr: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 2696178 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: wp_web_page_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 4602 Data size: 18408 Basic stats: COMPLETE Column stats: COMPLETE Reduce
[jira] [Commented] (HIVE-10641) create CRC32 UDF
[ https://issues.apache.org/jira/browse/HIVE-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534869#comment-14534869 ] Hive QA commented on HIVE-10641: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731095/HIVE-10641.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8914 tests executed *Failed tests:* {noformat} TestSparkClient - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3812/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3812/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3812/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731095 - PreCommit-HIVE-TRUNK-Build create CRC32 UDF Key: HIVE-10641 URL: https://issues.apache.org/jira/browse/HIVE-10641 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10641.1.patch CRC32 computes a cyclic redundancy check value for string or binary argument and returns bigint value. The result is NULL if the argument is NULL. MySQL has similar function https://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_crc32 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10580) Fix impossible cast in GenericUDF.getConstantLongValue
[ https://issues.apache.org/jira/browse/HIVE-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534767#comment-14534767 ] Ashutosh Chauhan commented on HIVE-10580: - This method is actually not used at all. We can just remove it. Fix impossible cast in GenericUDF.getConstantLongValue -- Key: HIVE-10580 URL: https://issues.apache.org/jira/browse/HIVE-10580 Project: Hive Issue Type: Bug Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10580.1.patch line 548-549 {code} if (constValue instanceof IntWritable) { v = ((LongWritable) constValue).get(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10656) Beeline set var=value not carrying over to queries
[ https://issues.apache.org/jira/browse/HIVE-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535085#comment-14535085 ] Reuben Kuhnert commented on HIVE-10656: --- This appears to be a problem with variable ambiguity: {code} set key=value {code} expands to: {code} set hiveconf:key=value {code} however, {code} select * from ${key} {code} expands to: {code} select* from ${hiveconf:key} {code} The question is basically, should we allow users to enter ambiguous properties, and if so should the {{key}} default to {{hiveconf:key}} or {{hivevar:key}}? Beeline set var=value not carrying over to queries -- Key: HIVE-10656 URL: https://issues.apache.org/jira/browse/HIVE-10656 Project: Hive Issue Type: Bug Reporter: Reuben Kuhnert Priority: Minor After performing a {{set name=value}} I would expect that the variable name would carry over to all locations within the session. It appears to work when querying the value via {{set;}}, but not when trying to do actual sql statements. Example: {code} 0: jdbc:hive2://localhost:1 set foo; +--+--+ | set| +--+--+ | foo=bar | +--+--+ 1 row selected (0.932 seconds) 0: jdbc:hive2://localhost:1 select * from ${foo}; Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'bar' (state=42S02,code=10001) 0: jdbc:hive2://localhost:1 show tables; ++--+ | tab_name | ++--+ | my | | purchases | ++--+ 2 rows selected (0.437 seconds) 0: jdbc:hive2://localhost:1 set foo=my; No rows affected (0.017 seconds) 0: jdbc:hive2://localhost:1 set foo; +-+--+ | set | +-+--+ | foo=my | +-+--+ 1 row selected (0.02 seconds) 0: jdbc:hive2://localhost:1 select * from ${foo}; select * from ${foo}; Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'bar' (state=42S02,code=10001) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)
[ https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10643: Attachment: HIVE-10643.patch Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following) --- Key: HIVE-10643 URL: https://issues.apache.org/jira/browse/HIVE-10643 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10643.patch The functionality should not be affected. Instead of passing 2 numbers (1 for # of preceding rows and 1 for # of following rows), we will pass WindowFrameDef object around. In the following subtasks, it will be used for the cases of {{rows between x preceding and y preceding}} and {{rows between x following and y following}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10651) ORC file footer cache should be bounded
[ https://issues.apache.org/jira/browse/HIVE-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535357#comment-14535357 ] Sergey Shelukhin commented on HIVE-10651: - This /might/ also affect LLAP when running w/o IO elevator. ORC file footer cache should be bounded --- Key: HIVE-10651 URL: https://issues.apache.org/jira/browse/HIVE-10651 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-10651.1.patch ORC's file footer cache is currently unbounded and is a soft reference cache. The cache size got from config is used to set initial capacity. We should bound the cache from growing too big and to get a predictable performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)
[ https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10643: Attachment: (was: HIVE-10643.patch) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following) --- Key: HIVE-10643 URL: https://issues.apache.org/jira/browse/HIVE-10643 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10643.patch The functionality should not be affected. Instead of passing 2 numbers (1 for # of preceding rows and 1 for # of following rows), we will pass WindowFrameDef object around. In the following subtasks, it will be used for the cases of {{rows between x preceding and y preceding}} and {{rows between x following and y following}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10657) Remove copyBytes operation from MD5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10657: --- Description: Current MD5 UDF implementation uses Apache Commons DigestUtils.md5Hex method to get md5 hex. DigestUtils does not provide md5Hex method with signature (byte[], start, length). This is why copyBytes method was added to UDFMd5 to get bytes[] from BytesWritable. To avoid copying bytes from BytesWritable to new byte array we can use java MessageDigest API directly. MessageDigest has method update(byte[], start, length) was: Current implementation uses Apache Commons DigestUtils.md5Hex method to get md5 hex. DigestUtils does not provide md5Hex method with signature (byte[], start, length). This is why copyBytes method was added to get bytes[] from BytesWritable. To avoid copying bytes from BytesWritable to new byte array we can use java MessageDigest API directly. MessageDigest has method update(byte[], start, length) Remove copyBytes operation from MD5 UDF --- Key: HIVE-10657 URL: https://issues.apache.org/jira/browse/HIVE-10657 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Current MD5 UDF implementation uses Apache Commons DigestUtils.md5Hex method to get md5 hex. DigestUtils does not provide md5Hex method with signature (byte[], start, length). This is why copyBytes method was added to UDFMd5 to get bytes[] from BytesWritable. To avoid copying bytes from BytesWritable to new byte array we can use java MessageDigest API directly. MessageDigest has method update(byte[], start, length) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues
[ https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10563: - Attachment: HIVE-10563.4.patch uploading the rebased patch. Thanks Hari MiniTezCliDriver tests ordering issues -- Key: HIVE-10563 URL: https://issues.apache.org/jira/browse/HIVE-10563 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch, HIVE-10563.3.patch, HIVE-10563.4.patch There are a bunch of tests related to TestMiniTezCliDriver which gives ordering issues when run on Centos/Windows/OSX -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10657) Remove copyBytes operation from MD5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10657: --- Attachment: HIVE-10657.1.patch patch #1 Remove copyBytes operation from MD5 UDF --- Key: HIVE-10657 URL: https://issues.apache.org/jira/browse/HIVE-10657 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10657.1.patch Current MD5 UDF implementation uses Apache Commons DigestUtils.md5Hex method to get md5 hex. DigestUtils does not provide md5Hex method with signature (byte[], start, length). This is why copyBytes method was added to UDFMd5 to get bytes[] from BytesWritable. To avoid copying bytes from BytesWritable to new byte array we can use java MessageDigest API directly. MessageDigest has method update(byte[], start, length) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10626) Spark paln need to be updated [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535343#comment-14535343 ] Hive QA commented on HIVE-10626: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12730949/HIVE-10626.2-spark.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/850/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/850/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-850/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12730949 - PreCommit-HIVE-SPARK-Build Spark paln need to be updated [Spark Branch] Key: HIVE-10626 URL: https://issues.apache.org/jira/browse/HIVE-10626 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-10626-spark.patch, HIVE-10626.1-spark.patch, HIVE-10626.2-spark.patch [HIVE-8858] basic patch was committed, latest patch need to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)
[ https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10643: Attachment: HIVE-10643.patch Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following) --- Key: HIVE-10643 URL: https://issues.apache.org/jira/browse/HIVE-10643 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Attachments: HIVE-10643.patch The functionality should not be affected. Instead of passing 2 numbers (1 for # of preceding rows and 1 for # of following rows), we will pass WindowFrameDef object around. In the following subtasks, it will be used for the cases of {{rows between x preceding and y preceding}} and {{rows between x following and y following}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10591: - Attachment: HIVE-10591.3.patch Support limited integer type promotion in ORC - Key: HIVE-10591 URL: https://issues.apache.org/jira/browse/HIVE-10591 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, HIVE-10591.2.patch, HIVE-10591.3.patch, HIVE-10591.3.patch, HIVE-10591.3.patch ORC currently does not support schema-on-read. If we alter an ORC table with 'int' type to 'bigint' and if we query the altered table ClassCastException will be thrown as the schema on read from table descriptor will expect LongWritable whereas ORC will return IntWritable based on file schema stored within ORC file. OrcSerde currently doesn't do any type conversions or type promotions for performance reasons in inner loop. Since smallints, ints and bigints are stored in the same way in ORC, it will be possible be allow such type promotions without hurting performance. Following type promotions can be supported without any casting smallint - int smallint - bigint int - bigint Tinyint promotion is not possible without casting as tinyints are stored using RLE byte writer whereas smallints, ints and bigints are stored using RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases
[ https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10542: -- Fix Version/s: 1.3.0 1.2.0 Full outer joins in tez produce incorrect results in certain cases -- Key: HIVE-10542 URL: https://issues.apache.org/jira/browse/HIVE-10542 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Blocker Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, HIVE-10542.6.patch, HIVE-10542.7.patch, HIVE-10542.8.patch, HIVE-10542.9.patch If there is no records for one of the tables in the full outer join, we do not read the other input and end up not producing rows which we should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10568: Attachment: HIVE-10568.2.patch Addressed review comments. Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption
[ https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10394: - Attachment: (was: HIVE-10394.1.patch) LLAP: Notify AM of pre-emption -- Key: HIVE-10394 URL: https://issues.apache.org/jira/browse/HIVE-10394 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Pre-empted tasks should be notified to AM as killed/interrupted by system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10660) Fix typo in Type.getType(TTypeId) exception message
[ https://issues.apache.org/jira/browse/HIVE-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Witt updated HIVE-10660: --- Attachment: HIVE-10660.patch Fix typo in Type.getType(TTypeId) exception message --- Key: HIVE-10660 URL: https://issues.apache.org/jira/browse/HIVE-10660 Project: Hive Issue Type: Bug Reporter: Keegan Witt Assignee: Keegan Witt Priority: Trivial Attachments: HIVE-10660.patch {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}} throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 'Unrecognized'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10658) ACID operation expose encrypted data
[ https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535431#comment-14535431 ] Sergio Peña commented on HIVE-10658: Doesn't ACID get the scratch directory from the Context object? The {{SemanticAnalyzer.getMetaData()}} gets the encrypted or /tmp directory from getStagingDirectoryPathname() and sets the value to the Context. This might help. See the line {{Path stagingPath = getStagingDirectoryPathname(qb);}} ACID operation expose encrypted data Key: HIVE-10658 URL: https://issues.apache.org/jira/browse/HIVE-10658 Project: Hive Issue Type: Sub-task Reporter: Eugene Koifman Insert/Update/Delete operations all use temporary tables. the data in temp tables is stored under the hive.exec.scratchdir which is not usually encrypted. This is a similar issue to using scratchdir for staging query results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535528#comment-14535528 ] Sushanth Sowmyan commented on HIVE-9736: I did not find this in the precommit queue, so I've manually added it in now : build#3815 should test this. StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names
[ https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535530#comment-14535530 ] Dipankar commented on HIVE-9544: Alternate way of doing this is : hive -hiveconf schema=mydb -e 'drop table ${hiveconf:schema}.my_table_name' I.e .. pass the schema/database name as hive conf. Error dropping fully qualified partitioned table - Internal error processing get_partition_names Key: HIVE-9544 URL: https://issues.apache.org/jira/browse/HIVE-9544 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor When attempting to drop a partitioned table using a fully qualified name I get this error: {code} hive -e 'drop table myDB.my_table_name;' Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.TApplicationException: Internal error processing get_partition_names {code} It succeeds if I instead do: {code}hive -e 'use myDB; drop table my_table_name;'{code} Regards, Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535528#comment-14535528 ] Sushanth Sowmyan edited comment on HIVE-9736 at 5/8/15 9:06 PM: I did not find this in the precommit queue, so I've manually added it in now : build#3833 should test this. was (Author: sushanth): I did not find this in the precommit queue, so I've manually added it in now : build#3815 should test this. StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption
[ https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10394: - Attachment: HIVE-10394.1.patch LLAP: Notify AM of pre-emption -- Key: HIVE-10394 URL: https://issues.apache.org/jira/browse/HIVE-10394 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10394.1.patch Pre-empted tasks should be notified to AM as killed/interrupted by system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-6424) webhcat.jar no longer includes webhcat-lo4j.properties
[ https://issues.apache.org/jira/browse/HIVE-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved HIVE-6424. -- Resolution: Implemented Assignee: Eugene Koifman The same changes as in the attached patch are already present in the codebase. webhcat.jar no longer includes webhcat-lo4j.properties -- Key: HIVE-6424 URL: https://issues.apache.org/jira/browse/HIVE-6424 Project: Hive Issue Type: Bug Components: Build Infrastructure, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: hive6424.patch pre Maven switch, webhcat-log4j.properties and webhcat-default.xml were at the root of hive-webhcat-0.13.0-SNAPSHOT.jar. They are no longer there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10651) ORC file footer cache should be bounded
[ https://issues.apache.org/jira/browse/HIVE-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535371#comment-14535371 ] Sergey Shelukhin commented on HIVE-10651: - +1 ORC file footer cache should be bounded --- Key: HIVE-10651 URL: https://issues.apache.org/jira/browse/HIVE-10651 Project: Hive Issue Type: Bug Affects Versions: 1.3.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-10651.1.patch ORC's file footer cache is currently unbounded and is a soft reference cache. The cache size got from config is used to set initial capacity. We should bound the cache from growing too big and to get a predictable performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10548) Remove dependency to s3 repository in root pom
[ https://issues.apache.org/jira/browse/HIVE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535482#comment-14535482 ] Hive QA commented on HIVE-10548: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731098/HIVE-10548.2.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8915 tests executed *Failed tests:* {noformat} TestSchedulerQueue - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3813/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3813/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3813/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731098 - PreCommit-HIVE-TRUNK-Build Remove dependency to s3 repository in root pom -- Key: HIVE-10548 URL: https://issues.apache.org/jira/browse/HIVE-10548 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Szehon Ho Assignee: Chengxiang Li Attachments: HIVE-10548.2.patch, HIVE-10548.2.patch, HIVE-10548.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption
[ https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10394: - Attachment: (was: HIVE-10394.1.patch) LLAP: Notify AM of pre-emption -- Key: HIVE-10394 URL: https://issues.apache.org/jira/browse/HIVE-10394 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Pre-empted tasks should be notified to AM as killed/interrupted by system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption
[ https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10394: - Attachment: HIVE-10394.1.patch The notification to AM is yet to be hooked up. This patch currently adds pre-emption of requests that are already in wait queue. LLAP: Notify AM of pre-emption -- Key: HIVE-10394 URL: https://issues.apache.org/jira/browse/HIVE-10394 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10394.1.patch Pre-empted tasks should be notified to AM as killed/interrupted by system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10660) Fix typo in Type.getType(TTypeId) exception message
[ https://issues.apache.org/jira/browse/HIVE-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Witt updated HIVE-10660: --- Description: {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}} throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 'Unregonized'. (was: {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}} throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 'Unrecognized'.) Fix typo in Type.getType(TTypeId) exception message --- Key: HIVE-10660 URL: https://issues.apache.org/jira/browse/HIVE-10660 Project: Hive Issue Type: Bug Reporter: Keegan Witt Assignee: Keegan Witt Priority: Trivial Attachments: HIVE-10660.patch {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}} throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 'Unregonized'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535603#comment-14535603 ] Laljo John Pullokkaran commented on HIVE-10568: --- +1 Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10639) create SHA1 UDF
[ https://issues.apache.org/jira/browse/HIVE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534462#comment-14534462 ] Hive QA commented on HIVE-10639: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731070/HIVE-10639.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8914 tests executed *Failed tests:* {noformat} TestSparkClient - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3810/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3810/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3810/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731070 - PreCommit-HIVE-TRUNK-Build create SHA1 UDF --- Key: HIVE-10639 URL: https://issues.apache.org/jira/browse/HIVE-10639 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10639.1.patch Calculates an SHA-1 160-bit checksum for the string and binary, as described in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 hex digits, or NULL if the argument was NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error
[ https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10640: Fix Version/s: (was: 1.2.0) Vectorized query with NULL constant throws Unsuported vector output type: void error --- Key: HIVE-10640 URL: https://issues.apache.org/jira/browse/HIVE-10640 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 This query from join_nullsafe.q when vectorized throws Unsuported vector output type: void during execution... {noformat} select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535750#comment-14535750 ] Sushanth Sowmyan commented on HIVE-10463: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost
[ https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10412: Fix Version/s: (was: 1.2.0) CBO : Calculate join selectivity when computing HiveJoin cost - Key: HIVE-10412 URL: https://issues.apache.org/jira/browse/HIVE-10412 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran This is from TPC-DS Q7 Because we don't compute the selectivity of sub-expression in a HiveJoin we assume that selective and non-selective joins have the similar cost. {code} select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, item where store_sales.ss_item_sk = item.i_item_sk and store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and cd_gender = 'F' and cd_marital_status = 'W' and cd_education_status = 'Primary' group by i_item_id order by i_item_id limit 100 {code} Cardinality {code} item 462,000 customer_demographics 1,920,800 store_sales 82,510,879,939 {code} NDVs {code} item.i_item_sk 439501 customer_demographics.cd_demo_sk 1835839 store_sales.ss_cdemo_sk 1835839 {code} From the logs {code} 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 2.1362E11 cpu, 1.07207098E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(78)) - MapJoin selected 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io} 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect
[ https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535752#comment-14535752 ] Sushanth Sowmyan commented on HIVE-10415: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. hive.start.cleanup.scratchdir configuration is not taking effect Key: HIVE-10415 URL: https://issues.apache.org/jira/browse/HIVE-10415 Project: Hive Issue Type: Bug Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-10415.patch This configuration hive.start.cleanup.scratchdir is not taking effect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost
[ https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535756#comment-14535756 ] Sushanth Sowmyan commented on HIVE-10412: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. CBO : Calculate join selectivity when computing HiveJoin cost - Key: HIVE-10412 URL: https://issues.apache.org/jira/browse/HIVE-10412 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran This is from TPC-DS Q7 Because we don't compute the selectivity of sub-expression in a HiveJoin we assume that selective and non-selective joins have the similar cost. {code} select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, item where store_sales.ss_item_sk = item.i_item_sk and store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and cd_gender = 'F' and cd_marital_status = 'W' and cd_education_status = 'Primary' group by i_item_id order by i_item_id limit 100 {code} Cardinality {code} item 462,000 customer_demographics 1,920,800 store_sales 82,510,879,939 {code} NDVs {code} item.i_item_sk 439501 customer_demographics.cd_demo_sk 1835839 store_sales.ss_cdemo_sk 1835839 {code} From the logs {code} 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 2.1362E11 cpu, 1.07207098E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(78)) - MapJoin selected 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io} 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10304: Fix Version/s: (was: 1.2.0) Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9012) Not able to move and populate the data fully on to the table when the scratch directory is on S3
[ https://issues.apache.org/jira/browse/HIVE-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9012: --- Fix Version/s: (was: 0.13.1) Not able to move and populate the data fully on to the table when the scratch directory is on S3 Key: HIVE-9012 URL: https://issues.apache.org/jira/browse/HIVE-9012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Environment: Amazon AMI and S3 as storage service Reporter: Kolluru Som Shekhar Sharma Priority: Blocker Original Estimate: 504h Remaining Estimate: 504h I have set the hive.exec.scratchDir to point to a directory on S3 and external table is on S3 level. I ran a simple query which extracts the key value pairs from JSON string without any WHERE clause, and the about of data is ~500GB. The query ran fine, but when it is trying to move the data from the scratch directory it doesn't complete. So i need to kill the process and manually need to move the data. The data size in the scratch directory was nearly ~550GB I tried the same scenario with less data and putting where clause, it completed successfully and data also gets populated in the table. I checked the size in the table and in the scratch directory. The data in the table was showing 2MB and the data in the scratch directory is 48.6GB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9842) Enable session/operation timeout by default in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9842: --- Fix Version/s: (was: 1.2.0) Enable session/operation timeout by default in HiveServer2 -- Key: HIVE-9842 URL: https://issues.apache.org/jira/browse/HIVE-9842 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.2.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-9842.1.patch HIVE-5799 introduced a session/operation timeout which cleans up abandoned session and op handles. Currently, the default is set to no-op. We should set it to some reasonable value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9842) Enable session/operation timeout by default in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535778#comment-14535778 ] Sushanth Sowmyan commented on HIVE-9842: Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Enable session/operation timeout by default in HiveServer2 -- Key: HIVE-9842 URL: https://issues.apache.org/jira/browse/HIVE-9842 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 1.2.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-9842.1.patch HIVE-5799 introduced a session/operation timeout which cleans up abandoned session and op handles. Currently, the default is set to no-op. We should set it to some reasonable value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8218) function registry shared across sessions in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-8218: - Release Note: (was: This should be fixed by HIVE-2573) function registry shared across sessions in HiveServer2 --- Key: HIVE-8218 URL: https://issues.apache.org/jira/browse/HIVE-8218 Project: Hive Issue Type: Bug Components: HiveServer2, UDF Reporter: Thejas M Nair FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users will have same set of valid udfs. Ie, add/delete temporary function by one user would affect the namespace of other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8218) function registry shared across sessions in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535786#comment-14535786 ] Jason Dere commented on HIVE-8218: -- This should be fixed by HIVE-2573 function registry shared across sessions in HiveServer2 --- Key: HIVE-8218 URL: https://issues.apache.org/jira/browse/HIVE-8218 Project: Hive Issue Type: Bug Components: HiveServer2, UDF Reporter: Thejas M Nair FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users will have same set of valid udfs. Ie, add/delete temporary function by one user would affect the namespace of other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans
[ https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10107: -- Assignee: Pengcheng Xiong Union All : Vertex missing stats resulting in OOM and in-efficient plans Key: HIVE-10107 URL: https://issues.apache.org/jira/browse/HIVE-10107 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Reducer Vertices sending data to a Union all edge are missing statistics and as a result we either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION ALL. Query {code} select count(*) rowcount from (select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number union all select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales c, store_returns d where c.ss_item_sk = d.sr_item_sk and c.ss_ticket_number = d.sr_ticket_number) t group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number having rowcount 1; {code} Plan snippet {code} Edges: Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 - Union 3 (SIMPLE_EDGE) Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col3 1) (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE Select Operator expressions: _col3 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 7 Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 keys: 0 ss_item_sk (type: int), ss_ticket_number (type: int) 1 sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col1, _col6, _col8, _col27, _col34 Filter Operator predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean) Select Operator expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int) outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: count() keys: _col2 (type: int), _col0 (type: int), _col1 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3 Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int) value expressions: _col3 (type: bigint) {code} The full explain plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS) Reducer 4 - Union 3 (SIMPLE_EDGE) Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS) DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7 Vertices: Map 1 Map Operator Tree: TableScan
[jira] [Commented] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory
[ https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535858#comment-14535858 ] Eugene Koifman commented on HIVE-10629: --- Commit for HIVE-9264 includes a number for DROP TABLE statements in .q files. The default value for fs.trash.interval is 0, i.e. disable trash, and we don't seem to override it in unit tests. Dropping table in an encrypted zone does not drop warehouse directory - Key: HIVE-10629 URL: https://issues.apache.org/jira/browse/HIVE-10629 Project: Hive Issue Type: Sub-task Components: Security Reporter: Deepesh Khandelwal Assignee: Eugene Koifman Drop table in an encrypted zone removes the table but not its data. The client sees the following on Hive CLI: {noformat} hive drop table testtbl; OK Time taken: 0.158 seconds {noformat} On the Hive Metastore log following error is thrown: {noformat} 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: java.io.IOException Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl java.io.IOException: Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160) at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114) at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95) at org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270) at org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705) at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256) {noformat} The client should throw the error and maybe fail the drop table call. To delete the table data one currently has to use {{drop table testtbl purge}} which basically remove the table data permanently skipping trash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10325) Remove ExprNodeNullEvaluator
[ https://issues.apache.org/jira/browse/HIVE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536126#comment-14536126 ] Hive QA commented on HIVE-10325: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731244/HIVE-10325.2.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8919 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore org.apache.hive.jdbc.TestSSL.testSSLFetchHttp org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.testAdditionalHttpHeaders org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3818/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3818/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3818/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731244 - PreCommit-HIVE-TRUNK-Build Remove ExprNodeNullEvaluator Key: HIVE-10325 URL: https://issues.apache.org/jira/browse/HIVE-10325 Project: Hive Issue Type: Task Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10325.1.patch, HIVE-10325.2.patch, HIVE-10325.patch since its purpose can instead be served by ExprNodeConstantEvaluator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10565: Fix Version/s: (was: 1.2.0) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, HIVE-10565.09.patch, HIVE-10565.091.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535723#comment-14535723 ] Sushanth Sowmyan commented on HIVE-10565: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, HIVE-10565.09.patch, HIVE-10565.091.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10463: Fix Version/s: (was: 1.2.0) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect
[ https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10415: Fix Version/s: (was: 1.2.0) hive.start.cleanup.scratchdir configuration is not taking effect Key: HIVE-10415 URL: https://issues.apache.org/jira/browse/HIVE-10415 Project: Hive Issue Type: Bug Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-10415.patch This configuration hive.start.cleanup.scratchdir is not taking effect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535759#comment-14535759 ] Sushanth Sowmyan commented on HIVE-10304: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10194) CBO (Calcite Return Path): Equi join followed by theta join produces a cross product
[ https://issues.apache.org/jira/browse/HIVE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10194: Fix Version/s: (was: 1.2.0) CBO (Calcite Return Path): Equi join followed by theta join produces a cross product Key: HIVE-10194 URL: https://issues.apache.org/jira/browse/HIVE-10194 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Query {code} SELECT count(distinct ws_order_number) as order_count, sum(ws_ext_ship_cost) as total_shipping_cost, sum(ws_net_profit) as total_net_profit FROM web_sales ws1 JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk) JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk) JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk) LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number FROM web_sales ws2 JOIN web_sales ws3 ON (ws2.ws_order_number = ws3.ws_order_number) WHERE ws2.ws_warehouse_sk ws3.ws_warehouse_sk ) ws_wh1 ON (ws1.ws_order_number = ws_wh1.ws_order_number) LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number) WHERE d.d_date between '1999-05-01' and '1999-07-01' and ca.ca_state = 'TX' and s.web_company_name = 'pri' and wr1.wr_order_number is null limit 100 {code} Plan {code} OK Time taken: 0.23 seconds Warning: Map Join MAPJOIN[83][bigTable=ws1] in task 'Map 2' is a cross product OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 2 - Map 1 (BROADCAST_EDGE) Map 8 - Reducer 4 (BROADCAST_EDGE) Reducer 3 - Map 2 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 4 - Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 9 - Map 8 (SIMPLE_EDGE) DagName: mmokhtar_20150402132417_1bc8688b-59a0-4909-82a4-b9d386065bbd:3 Vertices: Map 1 Map Operator Tree: TableScan alias: ws1 filterExpr: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean) Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean) Statistics: Num rows: 71974471 Data size: 1151483592 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_ship_addr_sk (type: int) outputColumnNames: _col1 Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: int) Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: wr1 Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: wr_order_number (type: int) sort order: + Map-reduce partition columns: wr_order_number (type: int) Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: ws1 Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 1 outputColumnNames: _col1 input vertices: 0 Map 1 Statistics: Num rows: 5180969438964472 Data size: 20723877755857888 Basic stats: COMPLETE Column stats: COMPLETE Select
[jira] [Commented] (HIVE-10194) CBO (Calcite Return Path): Equi join followed by theta join produces a cross product
[ https://issues.apache.org/jira/browse/HIVE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535764#comment-14535764 ] Sushanth Sowmyan commented on HIVE-10194: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. CBO (Calcite Return Path): Equi join followed by theta join produces a cross product Key: HIVE-10194 URL: https://issues.apache.org/jira/browse/HIVE-10194 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Query {code} SELECT count(distinct ws_order_number) as order_count, sum(ws_ext_ship_cost) as total_shipping_cost, sum(ws_net_profit) as total_net_profit FROM web_sales ws1 JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk) JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk) JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk) LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number FROM web_sales ws2 JOIN web_sales ws3 ON (ws2.ws_order_number = ws3.ws_order_number) WHERE ws2.ws_warehouse_sk ws3.ws_warehouse_sk ) ws_wh1 ON (ws1.ws_order_number = ws_wh1.ws_order_number) LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number) WHERE d.d_date between '1999-05-01' and '1999-07-01' and ca.ca_state = 'TX' and s.web_company_name = 'pri' and wr1.wr_order_number is null limit 100 {code} Plan {code} OK Time taken: 0.23 seconds Warning: Map Join MAPJOIN[83][bigTable=ws1] in task 'Map 2' is a cross product OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 2 - Map 1 (BROADCAST_EDGE) Map 8 - Reducer 4 (BROADCAST_EDGE) Reducer 3 - Map 2 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 4 - Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 9 - Map 8 (SIMPLE_EDGE) DagName: mmokhtar_20150402132417_1bc8688b-59a0-4909-82a4-b9d386065bbd:3 Vertices: Map 1 Map Operator Tree: TableScan alias: ws1 filterExpr: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean) Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean) Statistics: Num rows: 71974471 Data size: 1151483592 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_ship_addr_sk (type: int) outputColumnNames: _col1 Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: int) Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: wr1 Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: wr_order_number (type: int) sort order: + Map-reduce partition columns: wr_order_number (type: int) Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: ws1 Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 1 outputColumnNames: _col1 input vertices: 0 Map 1 Statistics: Num rows:
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10165: Fix Version/s: (was: 1.2.0) Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle methods common to these two interfaces into a super {{BaseTransactionBatch}} interface. * Functionality that would be shared by implementations of both {{RecordWriters}} and {{RecordMutators}} has been factored out of {{AbstractRecordWriter}} into a new abstract base class
[jira] [Resolved] (HIVE-8218) function registry shared across sessions in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere resolved HIVE-8218. -- Resolution: Duplicate Release Note: This should be fixed by HIVE-2573 function registry shared across sessions in HiveServer2 --- Key: HIVE-8218 URL: https://issues.apache.org/jira/browse/HIVE-8218 Project: Hive Issue Type: Bug Components: HiveServer2, UDF Reporter: Thejas M Nair FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users will have same set of valid udfs. Ie, add/delete temporary function by one user would affect the namespace of other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10659) Beeline commands which contains semi-colon as a non-command terminator will fail
[ https://issues.apache.org/jira/browse/HIVE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10659: - Attachment: HIVE-10659.1.patch cc-ing [~thejas] / [~sushanth] for review. I tested this with the initially committed patch for HIVE-7018 and it seems to resolve the issue that we found in HIVE-10614. Once this patch goes in, we should be able to get HIVE-7018 in as well. Thanks Hari Beeline commands which contains semi-colon as a non-command terminator will fail Key: HIVE-10659 URL: https://issues.apache.org/jira/browse/HIVE-10659 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10659.1.patch Consider beeline for connecting to mysql and creating commands involving stored procedures. MySQL stored procedures have semi-colon as the statement terminator. Since this coincides with beeline's only available command terminator , semi-colon, beeline will not able to execute the original command successfully. The above scenario can happen when Hive SchemaTool is used to upgrade a mysql metastore db which contains stored procedure in the script(as the one introduced initially by HIVE-7018). As of now, we cannot have any stored procedures as part of MySQL scripts because schemaTool uses beeline as the jdbc client to connect to MySQL. This is a serious limitation and needs to be fixed by providing an option to beeline to not use ; as the command delimiter and process the entire line send to it as a single command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory
[ https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-10629: - Assignee: Eugene Koifman Dropping table in an encrypted zone does not drop warehouse directory - Key: HIVE-10629 URL: https://issues.apache.org/jira/browse/HIVE-10629 Project: Hive Issue Type: Sub-task Components: Security Reporter: Deepesh Khandelwal Assignee: Eugene Koifman Drop table in an encrypted zone removes the table but not its data. The client sees the following on Hive CLI: {noformat} hive drop table testtbl; OK Time taken: 0.158 seconds {noformat} On the Hive Metastore log following error is thrown: {noformat} 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: java.io.IOException Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl java.io.IOException: Failed to move to trash: hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl at org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160) at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114) at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95) at org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270) at org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705) at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256) {noformat} The client should throw the error and maybe fail the drop table call. To delete the table data one currently has to use {{drop table testtbl purge}} which basically remove the table data permanently skipping trash. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535870#comment-14535870 ] Sushanth Sowmyan commented on HIVE-10609: - Per discussion with Mostafa, I'll add this to the tentative list for 1.2 - i.e., it will not be considered a release blocker for 1.2.0, but if it gets done before the RC process ends, we will include it in the next RC being built, and include it for 1.2.0. Otherwise, it will make it in a stabilization 1.2.1 release. I'll update the fix version of this bug with the appropriate version at that time. Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk =
[jira] [Commented] (HIVE-9730) make sure logging is never called when not needed in perf-sensitive places
[ https://issues.apache.org/jira/browse/HIVE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535885#comment-14535885 ] Sushanth Sowmyan commented on HIVE-9730: Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. make sure logging is never called when not needed in perf-sensitive places -- Key: HIVE-9730 URL: https://issues.apache.org/jira/browse/HIVE-9730 Project: Hive Issue Type: Improvement Components: Logging Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9730.patch, log4j-llap.png log4j logging has really inefficient serialization !log4j-llap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9730) make sure logging is never called when not needed in perf-sensitive places
[ https://issues.apache.org/jira/browse/HIVE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9730: --- Fix Version/s: (was: 1.2.0) make sure logging is never called when not needed in perf-sensitive places -- Key: HIVE-9730 URL: https://issues.apache.org/jira/browse/HIVE-9730 Project: Hive Issue Type: Improvement Components: Logging Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9730.patch, log4j-llap.png log4j logging has really inefficient serialization !log4j-llap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9736: --- Affects Version/s: 1.2.0 StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Affects Versions: 1.2.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10566) LLAP: Vector row extraction allocates new extractors per process method call instead of just once
[ https://issues.apache.org/jira/browse/HIVE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10566: Fix Version/s: (was: 1.2.0) LLAP: Vector row extraction allocates new extractors per process method call instead of just once - Key: HIVE-10566 URL: https://issues.apache.org/jira/browse/HIVE-10566 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Extractors for unused columns (common for tables with many columns) are created for each batch instead of just once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10609: Fix Version/s: (was: 1.2.0) Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk =
[jira] [Commented] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =
[ https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535719#comment-14535719 ] Sushanth Sowmyan commented on HIVE-10628: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Incorrect result when vectorized native mapjoin is enabled using null safe operators = Key: HIVE-10628 URL: https://issues.apache.org/jira/browse/HIVE-10628 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10628.01.patch Incorrect results for this query: {noformat} select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid 1000; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =
[ https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10628: Fix Version/s: (was: 1.2.0) Incorrect result when vectorized native mapjoin is enabled using null safe operators = Key: HIVE-10628 URL: https://issues.apache.org/jira/browse/HIVE-10628 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10628.01.patch Incorrect results for this query: {noformat} select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid 1000; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error
[ https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535718#comment-14535718 ] Sushanth Sowmyan commented on HIVE-10640: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Vectorized query with NULL constant throws Unsuported vector output type: void error --- Key: HIVE-10640 URL: https://issues.apache.org/jira/browse/HIVE-10640 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 This query from join_nullsafe.q when vectorized throws Unsuported vector output type: void during execution... {noformat} select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10566) LLAP: Vector row extraction allocates new extractors per process method call instead of just once
[ https://issues.apache.org/jira/browse/HIVE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535722#comment-14535722 ] Sushanth Sowmyan commented on HIVE-10566: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. LLAP: Vector row extraction allocates new extractors per process method call instead of just once - Key: HIVE-10566 URL: https://issues.apache.org/jira/browse/HIVE-10566 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Extractors for unused columns (common for tables with many columns) are created for each batch instead of just once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535768#comment-14535768 ] Sushanth Sowmyan commented on HIVE-10165: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. (On an added note, this is almost a perfect example of how developers should file jiras, I think. [~leftylev], do you suppose we can link to this jira from the HowToContribute page?) Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle
[jira] [Updated] (HIVE-10115) HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled
[ https://issues.apache.org/jira/browse/HIVE-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10115: Fix Version/s: (was: 1.2.0) HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled --- Key: HIVE-10115 URL: https://issues.apache.org/jira/browse/HIVE-10115 Project: Hive Issue Type: Improvement Components: Authentication Affects Versions: 1.1.0 Reporter: Mubashir Kazia Assignee: Mubashir Kazia Labels: patch Attachments: HIVE-10115.0.patch In a Kerberized cluster when alternate authentication is enabled on HS2, it should also accept Kerberos Authentication. The reason this is important is because when we enable LDAP authentication HS2 stops accepting delegation token authentication. So we are forced to enter username passwords in the oozie configuration. The whole idea of SASL is that multiple authentication mechanism can be offered. If we disable Kerberos(GSSAPI) and delegation token (DIGEST) authentication when we enable LDAP authentication, this defeats SASL purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10121) Implement a hive --service udflint command to check UDF jars for common shading mistakes
[ https://issues.apache.org/jira/browse/HIVE-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10121: Fix Version/s: (was: 1.2.0) Implement a hive --service udflint command to check UDF jars for common shading mistakes Key: HIVE-10121 URL: https://issues.apache.org/jira/browse/HIVE-10121 Project: Hive Issue Type: New Feature Components: UDF Reporter: Gopal V Assignee: Abdelrahman Shettia Attachments: HIVE-10121.1.patch, HIVE-10121.2.patch, bad_udfs.out, bad_udfs_verbose.out, good_udfs.out, good_udfs_verbose.out Several SerDe and UDF jars tend to shade in various parts of the dependencies including hadoop-common or guava without relocation. Implement a simple udflint tool which automates some part of the class path and shaded resources audit process required when upgrading a hive install from an old version to a new one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10157) Make use of the timed version of getDagStatus in TezJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10157: Fix Version/s: (was: 1.2.0) Make use of the timed version of getDagStatus in TezJobMonitor -- Key: HIVE-10157 URL: https://issues.apache.org/jira/browse/HIVE-10157 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10157) Make use of the timed version of getDagStatus in TezJobMonitor
[ https://issues.apache.org/jira/browse/HIVE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535770#comment-14535770 ] Sushanth Sowmyan commented on HIVE-10157: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Make use of the timed version of getDagStatus in TezJobMonitor -- Key: HIVE-10157 URL: https://issues.apache.org/jira/browse/HIVE-10157 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10661) LLAP: investigate why GC with IO elevator disabled is so bad
[ https://issues.apache.org/jira/browse/HIVE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10661: Description: Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times. Time, DAG name, DAG time, GC time counter. GC time counter on LLAP seems relatively reliable. Note that non-IO jobs are also much slower during some time. It may not be explained entirely by GC, I am investigating it now. Running io and non-io on the same cluster w/o restarting produces these problems also only on non-IO runs I may look at this later, after main GC tuning, but for now I decided to give up on this since elevator will be on by default when using LLAP. {noformat} $ cat io-dag.csv 2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216 2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430 2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538 2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179 2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968 2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591 2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450 2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598 2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559 2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033 2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114 2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477 $ cat noio-dag.csv 2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276 2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546 2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823 2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224 2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089 2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709 2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698 2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900 2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769 2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763 2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387 2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482 {noformat} was: Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times. Time, DAG name, DAG time, GC time counter. GC time counter on LLAP seems relatively reliable. Note that non-IO jobs are also much slower during some time. It may not be explained entirely by GC, I am investigating it now. I may look at this later, after main GC tuning, but for now I decided to give up on this since elevator will be on by default when using LLAP. {noformat} $ cat io-dag.csv 2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216 2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430 2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538 2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179 2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968 2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591 2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450 2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598 2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559 2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033 2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114 2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477 $ cat noio-dag.csv 2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276 2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546 2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823 2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224 2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089 2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709 2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698 2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900 2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769 2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763 2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387 2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482 {noformat} LLAP: investigate why GC with IO elevator disabled is so bad Key: HIVE-10661 URL: https://issues.apache.org/jira/browse/HIVE-10661 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Prasanth Jayachandran Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 times. Time, DAG name, DAG time, GC time counter. GC time counter on LLAP seems relatively reliable. Note that non-IO jobs are also
[jira] [Updated] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all
[ https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9828: --- Fix Version/s: (was: 1.2.0) Semantic analyzer does not capture view parent entity for tables referred in view with union all - Key: HIVE-9828 URL: https://issues.apache.org/jira/browse/HIVE-9828 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.1.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch, HIVE-9828.2.patch Hive compiler adds tables used in a view definition in the input entity list, with the view as parent entity for the table. In case of a view with union all query, this is not being done property. For example, {noformat} create view view1 as select t.id from (select tab1.id from db.tab1 union all select tab2.id from db.tab2 ) t; {noformat} This query will capture tab1 and tab2 as read entity without view1 as parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all
[ https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535880#comment-14535880 ] Sushanth Sowmyan commented on HIVE-9828: Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Semantic analyzer does not capture view parent entity for tables referred in view with union all - Key: HIVE-9828 URL: https://issues.apache.org/jira/browse/HIVE-9828 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.1.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch, HIVE-9828.2.patch Hive compiler adds tables used in a view definition in the input entity list, with the view as parent entity for the table. In case of a view with union all query, this is not being done property. For example, {noformat} create view view1 as select t.id from (select tab1.id from db.tab1 union all select tab2.id from db.tab2 ) t; {noformat} This query will capture tab1 and tab2 as read entity without view1 as parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.
[ https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9736: --- Fix Version/s: (was: 1.2.0) StorageBasedAuthProvider should batch namenode-calls where possible. Key: HIVE-9736 URL: https://issues.apache.org/jira/browse/HIVE-9736 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Labels: TODOC1.2 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch Consider a table partitioned by 2 keys (dt, region). Say a dt partition could have 1 associated regions. Consider that the user does: {code:sql} ALTER TABLE my_table DROP PARTITION (dt='20150101'); {code} As things stand now, {{StorageBasedAuthProvider}} will make individual {{DistributedFileSystem.listStatus()}} calls for each partition-directory, and authorize each one separately. It'd be faster to batch the calls, and examine multiple FileStatus objects at once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9713) CBO : inefficient join order created for left join outer condition
[ https://issues.apache.org/jira/browse/HIVE-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535887#comment-14535887 ] Sushanth Sowmyan commented on HIVE-9713: Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. (Is this another candidate for 1.2.1?) CBO : inefficient join order created for left join outer condition -- Key: HIVE-9713 URL: https://issues.apache.org/jira/browse/HIVE-9713 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran For the query below which is a subset of TPC-DS Query 80, CBO joins catalog_sales with catalog_returns first although the CE of the join is relatively high. catalog_sales should be joined with the selective dimension tables first. {code} select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id {code} Logical plan from CBO debug logs {code} 2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner (CalcitePlanner.java:apply(743)) - Plan After Join Reordering: HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1395 HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1393 HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391 HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1508 HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 io}, id = 1506 HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 io}, id = 1504 HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 0.0 io}, id = 1418 HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413 HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324 HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136 HiveProject(cr_item_sk=[$2], cr_order_number=[$16], cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327 HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137 HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371 HiveFilter(condition=[between(false, $2, CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369 HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1138 HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375 HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount = 11718.0, cumulative cost = {0}, id = 1139 HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount =
[jira] [Commented] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled
[ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535890#comment-14535890 ] Sushanth Sowmyan commented on HIVE-9695: Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Redundant filter operator in reducer Vertex when CBO is disabled Key: HIVE-9695 URL: https://issues.apache.org/jira/browse/HIVE-9695 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran There is a redundant filter operator in reducer Vertex when CBO is disabled. Query {code} select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b, store where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500 and s_store_sk = ss_store_sk; {code} Plan snippet {code} Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean) {code} Full plan with CBO disabled {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE) DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 Vertices: Map 1 Map Operator Tree: TableScan alias: b filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: sr_item_sk (type: int), sr_ticket_number (type: int) sort order: ++ Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int) Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE value expressions: sr_returned_date_sk (type: int) Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: store filterExpr: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: s_store_sk (type: int) sort order: + Map-reduce partition columns: s_store_sk (type: int) Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: a filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean) Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean) Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: ss_item_sk (type: int), ss_ticket_number (type: int) sort order: ++
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535732#comment-14535732 ] Mostafa Mokhtar commented on HIVE-10609: [~sushanth] This fix is needed to keep queries from crashing. Please include in 1.2.0 Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN
[jira] [Updated] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins
[ https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10533: Fix Version/s: (was: 1.2.0) CBO (Calcite Return Path): Join to MultiJoin support for outer joins Key: HIVE-10533 URL: https://issues.apache.org/jira/browse/HIVE-10533 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez CBO return path: auto_join7.q can be used to reproduce the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10557) CBO : Support reference to alias in queries
[ https://issues.apache.org/jira/browse/HIVE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10557: Fix Version/s: (was: 1.2.0) CBO : Support reference to alias in queries Key: HIVE-10557 URL: https://issues.apache.org/jira/browse/HIVE-10557 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Priority: Minor Query {code:sql} explain select count(*) rowcount from (select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500 union all select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales c, store_returns d where c.ss_item_sk = d.sr_item_sk and c.ss_ticket_number = d.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500) t group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number having rowcount 1 {code} Exception {code} 15/04/30 04:44:21 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Encountered Select alias 'rowcount' in having clause 'rowcount 1' This non standard behavior is not supported with cbo on. Turn off cbo for these queries. at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.validateNoHavingReferenceToAlias(CalcitePlanner.java:2888) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBHavingLogicalPlan(CalcitePlanner.java:2828) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:2738) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:804) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:765) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:604) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:242) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10015) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:205) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins
[ https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535730#comment-14535730 ] Sushanth Sowmyan commented on HIVE-10533: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. CBO (Calcite Return Path): Join to MultiJoin support for outer joins Key: HIVE-10533 URL: https://issues.apache.org/jira/browse/HIVE-10533 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez CBO return path: auto_join7.q can be used to reproduce the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10557) CBO : Support reference to alias in queries
[ https://issues.apache.org/jira/browse/HIVE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535726#comment-14535726 ] Sushanth Sowmyan commented on HIVE-10557: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. CBO : Support reference to alias in queries Key: HIVE-10557 URL: https://issues.apache.org/jira/browse/HIVE-10557 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Priority: Minor Query {code:sql} explain select count(*) rowcount from (select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales a, store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500 union all select ss_item_sk, ss_ticket_number, ss_store_sk from store_sales c, store_returns d where c.ss_item_sk = d.sr_item_sk and c.ss_ticket_number = d.sr_ticket_number and ss_sold_date_sk between 2450816 and 2451500 and sr_returned_date_sk between 2450816 and 2451500) t group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number having rowcount 1 {code} Exception {code} 15/04/30 04:44:21 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Encountered Select alias 'rowcount' in having clause 'rowcount 1' This non standard behavior is not supported with cbo on. Turn off cbo for these queries. at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.validateNoHavingReferenceToAlias(CalcitePlanner.java:2888) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBHavingLogicalPlan(CalcitePlanner.java:2828) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:2738) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:804) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:765) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:604) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:242) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10015) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:205) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10479) Empty tabAlias in columnInfo which triggers PPD
[ https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535745#comment-14535745 ] Sushanth Sowmyan commented on HIVE-10479: - Removing fix version of 1.2.0 in preparation of release, since this is not a blocker for 1.2.0. Empty tabAlias in columnInfo which triggers PPD --- Key: HIVE-10479 URL: https://issues.apache.org/jira/browse/HIVE-10479 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Attachments: HIVE-10479.patch in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when aliases contains empty string and key is an empty string too, it assumes that aliases contains key. This will trigger incorrect PPD. To reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q. -- This message was sent by Atlassian JIRA (v6.3.4#6332)