[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause
[ https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16885: --- Attachment: HIVE-16885.03.patch > Non-equi Joins: Filter clauses should be pushed into the ON clause > -- > > Key: HIVE-16885 > URL: https://issues.apache.org/jira/browse/HIVE-16885 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16885.01.patch, HIVE-16885.02.patch, > HIVE-16885.03.patch, HIVE-16885.patch > > > FIL_24 -> MAPJOIN_23 > {code} > hive> explain select * from part where p_size > (select max(p_size) from > part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=110 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=110 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=330 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=2 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator [SEL_22] (rows=2 width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=2 width=621) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: HIVE-16589.0994.patch > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: (was: HIVE-16589.0994.patch) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, > HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057052#comment-16057052 ] Hive QA commented on HIVE-16929: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873771/HIVE-16929.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=220) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5705/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5705/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5705/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873771 - PreCommit-HIVE-Build > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. > Such as, > {code:java} > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
[ https://issues.apache.org/jira/browse/HIVE-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057031#comment-16057031 ] anishek commented on HIVE-16918: I think HIVE_IN_TEST is definitely better indicator. However I was thinking that any method on pfile implementation should first check for "HIVE_IN_TEST" and everywhere else we can just do the pfile/file scheme check. this way we wont be using HIVE_IN_TEST in various classes as it will be limited to only the proxyfilesystem class and we use the pfile as a regular scheme everywhere. What do you think ? > Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp > -- > > Key: HIVE-16918 > URL: https://issues.apache.org/jira/browse/HIVE-16918 > Project: Hive > Issue Type: Bug > Components: repl >Affects Versions: 3.0.0 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-16918.2.patch, HIVE-16918.patch > > > With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. > This, however, is incorrect for copying _metadata generated from a temporary > scratch directory to hdfs. We need to change that so that routes to using a > regular CopyTask. The issue with using distcp for this is that distcp > launches from another job which may be queued on another machine, which does > not have access to this file:// uri. Distcp should only ever be used when > copying from non-localfilesystems. > Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a > default for invocations of distcp from hive. Adding that in. This would not > be necessary if HADOOP-8143 had made it in, but till it doesn't go in, we > need it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057012#comment-16057012 ] Hive QA commented on HIVE-16929: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873771/HIVE-16929.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5704/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5704/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5704/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873771 - PreCommit-HIVE-Build > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. > Such as, > {code:java} > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057007#comment-16057007 ] Rui Li commented on HIVE-16840: --- Besides, it's better to add a separate optimizer for this optimization. SetSparkReducerParallelism is only intended to set parallelism for RSes. > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: HIVE-16589.0994.patch > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: (was: HIVE-16589.0994.patch) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Status: Patch Available (was: Open) > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Attachment: (was: HIVE-13567.16.patch) > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Attachment: HIVE-13567.16.patch > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Status: Open (was: Patch Available) > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task
[ https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16893 started by anishek. -- > move replication dump related work in semantic analysis phase to execution > phase using a task > - > > Key: HIVE-16893 > URL: https://issues.apache.org/jira/browse/HIVE-16893 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > Since we run in to the possibility of creating a large number tasks during > replication bootstrap dump > * we may not be able to hold all of them in memory for really large > databases, which might not hold true once we complete HIVE-16892 > * Also a compile time lock is taken such that only one query is run in this > phase which in replication bootstrap scenario is going to be a very long > running task and hence moving it to execution phase will limit the lock > period in compile phase. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
[ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056981#comment-16056981 ] Matt McCline commented on HIVE-16919: - First one: 4 back: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))), 11 back in MIN(cint) -1069736047 1st is MAX(cint) -2030 > Vectorization: vectorization_short_regress.q has query result differences > with non-vectorized run. Vectorized unary function broken? > - > > Key: HIVE-16919 > URL: https://issues.apache.org/jira/browse/HIVE-16919 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > Jason spotted a difference in the query result for > vectorization_short_regress.q.out -- that is when vectorization is turned off > and a base .q.out file created, there are 2 differences. > They both seem to be related to negation. For example, in the first one > MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec. > So, it doesn't appear that aggregation is failing. It seems like the issue > is now that the Reducer is vectorizing, a bug is exposed. So, even though > MAX and MIN are the same, the expression with negation returns different > results. > 19th field of the query below: Vectorized 511 vs Non-Vectorized -58 > {noformat} > SELECT MAX(cint), >(MAX(cint) / -3728), >(MAX(cint) * -3728), >VAR_POP(cbigint), >(-((MAX(cint) * -3728))), >STDDEV_POP(csmallint), >(-563 % (MAX(cint) * -3728)), >(VAR_POP(cbigint) / STDDEV_POP(csmallint)), >(-(STDDEV_POP(csmallint))), >MAX(cdouble), >AVG(ctinyint), >(STDDEV_POP(csmallint) - 10.175), >MIN(cint), >((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)), >(-(MAX(cdouble))), >MIN(cdouble), >(MAX(cdouble) % -26.28), >STDDEV_SAMP(csmallint), >(-((MAX(cint) / -3728))), >((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))), >((MAX(cint) / -3728) - AVG(ctinyint)), >(-((MAX(cint) * -3728))), >VAR_SAMP(cint) > FROM alltypesorc > WHERE (((cbigint <= 197) > AND (cint < cbigint)) > OR ((cdouble >= -26.28) > AND (csmallint > cdouble)) > OR ((ctinyint > cfloat) > AND (cstring1 RLIKE '.*ss.*')) >OR ((cfloat > 79.553) >AND (cstring2 LIKE '10%'))) > {noformat} > Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * > -3728))), > --- > This is a previously existing issue and now filed as HIVE-16919: > "Vectorization: vectorization_short_regress.q has query result differences > with non-vectorized run" > 10th field of the query below: Non-Vectorized -6432.15344526 vs. > -Vectorized -6432.0 > Column expression is (-(cdouble)) as c4, > Query result for vectorization_short_regress.q.out -- that is when > vectorization is turned off and a base .q.out file created. > --- > 10th field of the query below: Non-Vectorized -6432.15344526 vs. > Vectorized -6432.0 > Column expression is (-(cdouble)) as c4, > {noformat} > SELECT ctimestamp1, > cstring2, > cdouble, > cfloat, > cbigint, > csmallint, > (cbigint / 3569) as c1, > (-257 - csmallint) as c2, > (-6432 * cfloat) as c3, > (-(cdouble)) as c4, > (cdouble * 10.175) as c5, > ((-6432 * cfloat) / cfloat) as c6, > (-(cfloat)) as c7, > (cint % csmallint) as c8, > (-(cdouble)) as c9, > (cdouble * (-(cdouble))) as c10 > FROM alltypesorc > WHERE(((-1.389 >= cint) >AND ((csmallint < ctinyint) > AND (-6432 > csmallint))) > OR ((cdouble >= cfloat) > AND (cstring2 <= 'a')) > OR ((cstring1 LIKE 'ss%') > AND (10.175 > cbigint))) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication
[ https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16892: --- Status: Patch Available (was: In Progress) > Move creation of _files from ReplCopyTask to analysis phase for boostrap > replication > - > > Key: HIVE-16892 > URL: https://issues.apache.org/jira/browse/HIVE-16892 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-16892.1.patch > > > during replication boostrap we create the _files via ReplCopyTask for > partitions and tables, this can be done inline as part of analysis phase > rather than creating the replCopytask, > This is done to prevent creation of huge number of these tasks in memory > before giving it to the execution engine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication
[ https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek updated HIVE-16892: --- Attachment: HIVE-16892.1.patch > Move creation of _files from ReplCopyTask to analysis phase for boostrap > replication > - > > Key: HIVE-16892 > URL: https://issues.apache.org/jira/browse/HIVE-16892 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-16892.1.patch > > > during replication boostrap we create the _files via ReplCopyTask for > partitions and tables, this can be done inline as part of analysis phase > rather than creating the replCopytask, > This is done to prevent creation of huge number of these tasks in memory > before giving it to the execution engine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication
[ https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056979#comment-16056979 ] ASF GitHub Bot commented on HIVE-16892: --- GitHub user anishek opened a pull request: https://github.com/apache/hive/pull/196 HIVE-16892 : Move creation of _files from ReplCopyTask to analysis phase for boostrap replication the export semantic analyzer is still using the inputs/outputs since that should be used by repl v1 and hence authorization there might be required outside of what is done in repl v2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/anishek/hive HIVE-16892 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/196.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #196 commit e1a16e918e7809a7c6f6a1fd004e7f9798ff7c79 Author: Anishek Agarwal Date: 2017-06-21T04:34:12Z HIVE-16892 : Move creation of _files from ReplCopyTask to analysis phase for boostrap replication > Move creation of _files from ReplCopyTask to analysis phase for boostrap > replication > - > > Key: HIVE-16892 > URL: https://issues.apache.org/jira/browse/HIVE-16892 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > during replication boostrap we create the _files via ReplCopyTask for > partitions and tables, this can be done inline as part of analysis phase > rather than creating the replCopytask, > This is done to prevent creation of huge number of these tasks in memory > before giving it to the execution engine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056978#comment-16056978 ] Matt McCline commented on HIVE-16589: - Ok, I've convinced myself that Vectorized 511 vs Non-Vectorized -58 vectorization_short_regress.q issue is not aggregation related but an old bug. It is now part of HIVE-16919. > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
[ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16919: Description: Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created, there are 2 differences. They both seem to be related to negation. For example, in the first one MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec. So, it doesn't appear that aggregation is failing. It seems like the issue is now that the Reducer is vectorizing, a bug is exposed. So, even though MAX and MIN are the same, the expression with negation returns different results. 19th field of the query below: Vectorized 511 vs Non-Vectorized -58 {noformat} SELECT MAX(cint), (MAX(cint) / -3728), (MAX(cint) * -3728), VAR_POP(cbigint), (-((MAX(cint) * -3728))), STDDEV_POP(csmallint), (-563 % (MAX(cint) * -3728)), (VAR_POP(cbigint) / STDDEV_POP(csmallint)), (-(STDDEV_POP(csmallint))), MAX(cdouble), AVG(ctinyint), (STDDEV_POP(csmallint) - 10.175), MIN(cint), ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)), (-(MAX(cdouble))), MIN(cdouble), (MAX(cdouble) % -26.28), STDDEV_SAMP(csmallint), (-((MAX(cint) / -3728))), ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))), ((MAX(cint) / -3728) - AVG(ctinyint)), (-((MAX(cint) * -3728))), VAR_SAMP(cint) FROM alltypesorc WHERE (((cbigint <= 197) AND (cint < cbigint)) OR ((cdouble >= -26.28) AND (csmallint > cdouble)) OR ((ctinyint > cfloat) AND (cstring1 RLIKE '.*ss.*')) OR ((cfloat > 79.553) AND (cstring2 LIKE '10%'))) {noformat} Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))), --- This is a previously existing issue and now filed as HIVE-16919: "Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run" 10th field of the query below: Non-Vectorized -6432.15344526 vs. -Vectorized -6432.0 Column expression is (-(cdouble)) as c4, Query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created. --- 10th field of the query below: Non-Vectorized -6432.15344526 vs. Vectorized -6432.0 Column expression is (-(cdouble)) as c4, {noformat} SELECT ctimestamp1, cstring2, cdouble, cfloat, cbigint, csmallint, (cbigint / 3569) as c1, (-257 - csmallint) as c2, (-6432 * cfloat) as c3, (-(cdouble)) as c4, (cdouble * 10.175) as c5, ((-6432 * cfloat) / cfloat) as c6, (-(cfloat)) as c7, (cint % csmallint) as c8, (-(cdouble)) as c9, (cdouble * (-(cdouble))) as c10 FROM alltypesorc WHERE(((-1.389 >= cint) AND ((csmallint < ctinyint) AND (-6432 > csmallint))) OR ((cdouble >= cfloat) AND (cstring2 <= 'a')) OR ((cstring1 LIKE 'ss%') AND (10.175 > cbigint))) {noformat} was: Query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created. --- 10th field of the query below: Non-Vectorized -6432.15344526 vs. Vectorized -6432.0 Column expression is (-(cdouble)) as c4, {noformat} SELECT ctimestamp1, cstring2, cdouble, cfloat, cbigint, csmallint, (cbigint / 3569) as c1, (-257 - csmallint) as c2, (-6432 * cfloat) as c3, (-(cdouble)) as c4, (cdouble * 10.175) as c5, ((-6432 * cfloat) / cfloat) as c6, (-(cfloat)) as c7, (cint % csmallint) as c8, (-(cdouble)) as c9, (cdouble * (-(cdouble))) as c10 FROM alltypesorc WHERE(((-1.389 >= cint) AND ((csmallint < ctinyint) AND (-6432 > csmallint))) OR ((cdouble >= cfloat) AND (cstring2 <= 'a')) OR ((cstring1 LIKE 'ss%') AND (10.175 > cbigint))) {noformat} > Vectorization: vectorization_short_regress.q has query result differences > with non-vectorized run. Vectorized unary function broken? > - > > Key: HIVE-16919 > URL: https://issues.apache.org/jira/browse/HIVE-16919 > Project: Hive > Issue Type: Bug > Components: Hive >
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056976#comment-16056976 ] Hive QA commented on HIVE-16927: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873767/HIVE-16927.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5703/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5703/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5703/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873767 - PreCommit-HIVE-Build > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
[ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16919: Summary: Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken? (was: Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run.) > Vectorization: vectorization_short_regress.q has query result differences > with non-vectorized run. Vectorized unary function broken? > - > > Key: HIVE-16919 > URL: https://issues.apache.org/jira/browse/HIVE-16919 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > Query result for vectorization_short_regress.q.out -- that is when > vectorization is turned off and a base .q.out file created. > --- > 10th field of the query below: Non-Vectorized -6432.15344526 vs. > Vectorized -6432.0 > Column expression is (-(cdouble)) as c4, > {noformat} > SELECT ctimestamp1, > cstring2, > cdouble, > cfloat, > cbigint, > csmallint, > (cbigint / 3569) as c1, > (-257 - csmallint) as c2, > (-6432 * cfloat) as c3, > (-(cdouble)) as c4, > (cdouble * 10.175) as c5, > ((-6432 * cfloat) / cfloat) as c6, > (-(cfloat)) as c7, > (cint % csmallint) as c8, > (-(cdouble)) as c9, > (cdouble * (-(cdouble))) as c10 > FROM alltypesorc > WHERE(((-1.389 >= cint) >AND ((csmallint < ctinyint) > AND (-6432 > csmallint))) > OR ((cdouble >= cfloat) > AND (cstring2 <= 'a')) > OR ((cstring1 LIKE 'ss%') > AND (10.175 > cbigint))) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load
[ https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] anishek resolved HIVE-16900. Resolution: Duplicate > optimization to give distcp a list of input files to copy to a destination > target directory during repl load > > > Key: HIVE-16900 > URL: https://issues.apache.org/jira/browse/HIVE-16900 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > > During repl Copy currently we only allow operations per file as against list > of files supported by distcp, During bootstrap table/partitions load it will > be great to load all files listed in {noformat}_files{noformat} in a single > distcp job to make it more efficient, this would require changes to the > _shims_ sub project in hive to additionally expose api's which take multiple > source files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator
[ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056931#comment-16056931 ] Hive QA commented on HIVE-16761: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873764/HIVE-16761.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5702/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5702/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5702/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873764 - PreCommit-HIVE-Build > LLAP IO: SMB joins fail elevator > - > > Key: HIVE-16761 > URL: https://issues.apache.org/jira/browse/HIVE-16761 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-16761.01.patch, HIVE-16761.patch > > > {code} > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 26 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149) > ... 28 more > {code} > {code} > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=500; > select year,quarter,count(*) from transactions_raw_orc_200 a join > customer_accounts_orc_200 b on a.account_id=b.account_id group by > year,quarter; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056928#comment-16056928 ] Rui Li commented on HIVE-16840: --- bq. you mean that if the limit number is too large... Yeah. But it's a little tricky to set a proper upper bound for it. How about we do something like this: if statistics is available, we can estimate the number of rows in the input of the RS. If the limit number is, say, >= 90% of the rows, we can skip the optimization. If statistics is unavailable, we run the optimization anyway. You can find how we estimate num of bytes in SetSparkReducerParallelism. Guess we can estimate num of rows similarly. > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056908#comment-16056908 ] Chao Sun commented on HIVE-11297: - {quote} So can you retest it in your env? if the operator tree is like what you mentioned, i think all the operator tree in spark_dynamic_partition_pruning.q.out will be different as i generated in my env. {quote} Interesting.. I'm not sure what caused the difference, may be some configurations? I've tried several times in my env and the FIL is always followed by a SEL operator. Nevertheless, this is not an important issue. Will take a look a the RB. > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, > HIVE-11297.6.patch, HIVE-11297.7.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: HIVE-16589.0994.patch > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Status: In Progress (was: Patch Available) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Status: Patch Available (was: In Progress) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, > HIVE-16589.099.patch, HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used
[ https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056902#comment-16056902 ] Sunitha Beeram commented on HIVE-16844: --- [~sankarh] I am running into some issue fixing the unit tests and wondering if you have some input. I tried using an approach similar to what you did to fix the failures in TestReplicationScenariosAcrossInstances : ie, use the same configuration, but a different source and destination db name. However, the serialize and deserialize methods encode/decode the db and table names. I was able to work around them somewhat for: org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (by resetting the target dbname via HCatTable interface) and org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (by doing a string replace of dbName on the partition-spec string). But for org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema, I have hit a block; I can't update the dbname through the HCatAddPartitionDesc APIs nor HCatPartition APIs. I could add methods to either of these to update the dbname (HCatTable allows that), but I am beginning to wonder if this is right approach. Running multiple instances of Metastore within the same JVM is probably error prone as there could be other static variables in classes that might have unintended sharing, similar to how it has been an issue with this tests that this change broke. Are we better off handling these tests via integration tests and not unit tests? The other option might be to mock out the db completely. Let me know if you have further input on this. Thanks! > Fix Connection leak in ObjectStore when new Conf object is used > --- > > Key: HIVE-16844 > URL: https://issues.apache.org/jira/browse/HIVE-16844 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Sunitha Beeram >Assignee: Sunitha Beeram > Fix For: 3.0.0 > > Attachments: HIVE-16844.1.patch > > > The code path in ObjectStore.java currently leaks BoneCP (or Hikari) > connection pools when a new configuration object is passed in. The code needs > to ensure that the persistence-factory is closed before it is nullified. > The relevant code is > [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290]. > Note that pmf is set to null, but the underlying connection pool is not > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056883#comment-16056883 ] Hive QA commented on HIVE-16589: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873760/HIVE-16589.0993.patch {color:green}SUCCESS:{color} +1 due to 29 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=70) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_adaptor_usage_mode] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_reduce] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5701/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5701/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5701/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873760 - PreCommit-HIVE-Build > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, > HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery
[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056874#comment-16056874 ] Ashutosh Chauhan commented on HIVE-6348: I am not sure why can't order by removed in these cases. There is no contract that scripts and UDFs will see data in any particular order. So, its perfectly alright to remove sorts in such cases. > Order by/Sort by in subquery > > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Rui Li >Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056858#comment-16056858 ] liyunzhang_intel edited comment on HIVE-16840 at 6/21/17 2:24 AM: -- [~lirui]: bq.If so, I wonder whether we should put a limit on the limited number. E.g. if the number is too large, we should skip this optimization. you mean that if the limit number is too large( select * from A order by ColB limit 99, when the total records of A is 100), there is no performance improvement maybe degradation because now there is 1 extra reduce. bq.Besides, I don't think we need to add the sortLimit flag to RS. ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. thanks for suggestion. was (Author: kellyzly): [~lirui]: bq.If so, I wonder whether we should put a limit on the limited number. E.g. if the number is too large, we should skip this optimization. you mean that if the limit number is too large( select * from A order by ColB limit 100, when the total records of A is 99), there is no performance improvement maybe degradation because now there is 1 extra reduce. bq.Besides, I don't think we need to add the sortLimit flag to RS. ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. thanks for suggestion. > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by li
[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056858#comment-16056858 ] liyunzhang_intel commented on HIVE-16840: - [~lirui]: bq.If so, I wonder whether we should put a limit on the limited number. E.g. if the number is too large, we should skip this optimization. you mean that if the limit number is too large( select * from A order by ColB limit 100, when the total records of A is 99), there is no performance improvement maybe degradation because now there is 1 extra reduce. bq.Besides, I don't think we need to add the sortLimit flag to RS. ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. thanks for suggestion. > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056852#comment-16056852 ] Rui Li commented on HIVE-16840: --- To clarify, the idea is to introduce an extra MR shuffle and push the limit to it right? If so, I wonder whether we should put a limit on the limited number. E.g. if the number is too large, we should skip this optimization. Besides, I don't think we need to add the sortLimit flag to RS. ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. We can set that to false for the new RS and GenSparkUtils#getEdgeProperty should give us the MR shuffle. > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-11297: Attachment: HIVE-11297.7.patch [~csun]: update HIVE-11297.7.patch according to the last round of review in review board. > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, > HIVE-11297.6.patch, HIVE-11297.7.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837 ] liyunzhang_intel edited comment on HIVE-11297 at 6/21/17 2:09 AM: -- [~csun]: I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted above, i print the operator tree SplitOpTreeForDPP#process {code} . /** print the operator tree **/ ArrayList tableScanList = new ArrayList (); tableScanList.add((TableScanOperator)stack.get(0)); LOG.debug("operator tree:"+Operator.toString(tableScanList)); /** print the operator tree**/ Operator filterOp = pruningSinkOp; while (filterOp != null) { if (filterOp.getNumChild() > 1) { break; } else { filterOp = filterOp.getParentOperators().get(0); } } {code} the operator tree is: {code} TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {code} So can you retest it in your env? if the operator tree is like what you mentioned, i think all the operator tree in spark_dynamic_partition_pruning.q.out will be different as i generated in my env. was (Author: kellyzly): [~csun]: I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted above, i print the operator tree of filterOp SplitOpTreeForDPP#process {code} . /** print the operator tree **/ ArrayList tableScanList = new ArrayList (); tableScanList.add((TableScanOperator)stack.get(0)); LOG.debug("operator tree:"+Operator.toString(tableScanList)); /** print the operator tree**/ Operator filterOp = pruningSinkOp; while (filterOp != null) { if (filterOp.getNumChild() > 1) { break; } else { filterOp = filterOp.getParentOperators().get(0); } } {code} the operator tree is: {code} TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {code} > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837 ] liyunzhang_intel commented on HIVE-11297: - [~csun]: I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted above, i print the operator tree of filterOp SplitOpTreeForDPP#process {code} . /** print the operator tree **/ ArrayList tableScanList = new ArrayList (); tableScanList.add((TableScanOperator)stack.get(0)); LOG.debug("operator tree:"+Operator.toString(tableScanList)); /** print the operator tree**/ Operator filterOp = pruningSinkOp; while (filterOp != null) { if (filterOp.getNumChild() > 1) { break; } else { filterOp = filterOp.getParentOperators().get(0); } } {code} the operator tree is: {code} TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {code} > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056817#comment-16056817 ] Hive QA commented on HIVE-13567: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873759/HIVE-13567.16.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5700/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5700/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5700/ Messages: {noformat} This message was trimmed, see log for full details patching file ql/src/test/results/clientpositive/spark/stats_only_null.q.out patching file ql/src/test/results/clientpositive/spark/stats_partscan_1_23.q.out patching file ql/src/test/results/clientpositive/spark/statsfs.q.out patching file ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out patching file ql/src/test/results/clientpositive/spark/temp_table.q.out patching file ql/src/test/results/clientpositive/spark/union10.q.out patching file ql/src/test/results/clientpositive/spark/union12.q.out patching file ql/src/test/results/clientpositive/spark/union17.q.out patching file ql/src/test/results/clientpositive/spark/union18.q.out patching file ql/src/test/results/clientpositive/spark/union19.q.out patching file ql/src/test/results/clientpositive/spark/union22.q.out patching file ql/src/test/results/clientpositive/spark/union25.q.out patching file ql/src/test/results/clientpositive/spark/union28.q.out patching file ql/src/test/results/clientpositive/spark/union29.q.out patching file ql/src/test/results/clientpositive/spark/union30.q.out patching file ql/src/test/results/clientpositive/spark/union31.q.out patching file ql/src/test/results/clientpositive/spark/union33.q.out patching file ql/src/test/results/clientpositive/spark/union4.q.out patching file ql/src/test/results/clientpositive/spark/union6.q.out patching file ql/src/test/results/clientpositive/spark/union_lateralview.q.out patching file ql/src/test/results/clientpositive/spark/union_top_level.q.out patching file ql/src/test/results/clientpositive/spark/vector_char_4.q.out patching file ql/src/test/results/clientpositive/spark/vector_elt.q.out patching file ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out patching file ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out patching file ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out patching file ql/src/test/results/clientpositive/spark/vector_outer_join3.q.out patching file ql/src/test/results/clientpositive/spark/vector_outer_join4.q.out patching file ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out patching file ql/src/test/results/clientpositive/spark/vector_varchar_4.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_0.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_13.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_14.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_15.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_16.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_17.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_9.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_div0.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_pushdown.q.out patching file ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_case.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_shufflejoin.q.out patching file ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out patching file ql/src/test/results/clientpositive/special_character_in_tabnames_2.q.out patching file ql/src/test/results/clientpositive/stats0.q.out patching file ql/src/test/results/clientpositive/stats1.q.out patching file ql/src/test/results/clientpositive/stats10.q.out patching file ql/src/test/results/clientpositive/stats12.q.out patching file ql/src/test/results/clientpositive/stats13.q.out patching file ql/src/test/results/clientpositive/stats14.q.out patching file ql/src/test/results/clientpositive/stats15.q.out patching file ql/src/test/results/clientpositive/stats18.q.out patching file ql/src/test/results/clientpositive/stats2.q.out patching
[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables
[ https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056812#comment-16056812 ] Hive QA commented on HIVE-14988: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873753/HIVE-14988.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10837 tests executed *Failed tests:* {noformat} TestOperationLoggingLayout - did not produce a TEST-*.xml file (likely timed out) (batchId=222) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_ddl1] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_query5] (batchId=24) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=98) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=98) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[acid_overwrite] (batchId=89) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable (batchId=281) org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLockBlockedBy (batchId=281) org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMetastoreTablesCleanup (batchId=281) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation (batchId=219) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5699/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5699/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5699/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873753 - PreCommit-HIVE-Build > Support INSERT OVERWRITE into a partition on transactional tables > - > > Key: HIVE-14988 > URL: https://issues.apache.org/jira/browse/HIVE-14988 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, > HIVE-14988.03.patch > > > Insert overwrite operation on transactional table will currently raise an > error. > This can/should be supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16920) remove useless uri.getScheme() from EximUtil
[ https://issues.apache.org/jira/browse/HIVE-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056804#comment-16056804 ] Fei Hui commented on HIVE-16920: Failed tests are unrelated > remove useless uri.getScheme() from EximUtil > > > Key: HIVE-16920 > URL: https://issues.apache.org/jira/browse/HIVE-16920 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0 >Reporter: Fei Hui >Assignee: Fei Hui > Attachments: HIVE-16920.patch > > > {code:title=EximUtil.java|borderStyle=solid} > static URI getValidatedURI(HiveConf conf, String dcPath) throws > SemanticException { > try { > boolean testMode = conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODE); > URI uri = new Path(dcPath).toUri(); > String scheme = uri.getScheme(); > String authority = uri.getAuthority(); > String path = uri.getPath(); > FileSystem fs = FileSystem.get(uri, conf); > LOG.info("Path before norm :" + path); > // generate absolute path relative to home directory > if (!path.startsWith("/")) { > if (testMode) { > path = (new Path(System.getProperty("test.tmp.dir"), > path)).toUri().getPath(); > } else { > path = > (new Path(new Path("/user/" + System.getProperty("user.name")), > path)).toUri() > .getPath(); > } > } > // Get scheme from FileSystem > scheme = fs.getScheme(); > ... > } > {code} > We found that {{String scheme = uri.getScheme();}} is useless, we can remove > it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-16926: -- Attachment: HIVE-16926.1.patch Initial patch, restructured the LlapTaskUmbilicalExternalClient code a bit. - Uses shared LLAP umbilical server rather than a new server per external client - Retries rejected submissions (WorkSubmitter helper class) - No more deferred cleanup (from HIVE-16652). One thing about this is that once clients are closed/unregistered, communicator.stop() is called and it's removed from the registered list of clients. So we might get a few warning messages about untracked taskAttemptIds coming in during heartbeat() .. if this is undesirable we might be able to leave them in the registeredClients list (but ignore heartbeats to them as they are tagged as closed), and remove them using the HeartbeatCheckTask once they get too old. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056802#comment-16056802 ] Siddharth Seth commented on HIVE-16927: --- [~prasanth_j] - I don't think we make a permanent change of this being set to 0. A bad instance will never stop on it's own, and will keep trying to launch new containers. A better default would likely be numInstances, while making sure it is not too low (6 is the default for example), and the value is high enough to allow a node to be blacklisted. Option1: numInstances * threshold to mark a node as disabled. Option2: max(6, max(numInstances, threshold to mark a node as disabled)) Option3: ? An enhancement request to Slider to get better control over this > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056785#comment-16056785 ] ZhangBing Lin commented on HIVE-16929: -- Submit a patch > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. > Such as, > {code:java} > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Description: Add a configuration item "hive.aux.udf.package.name.list", which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. Such as, {code:java} hive.aux.udf.package.name.list com.sample.udf,com.test.udf {code} was:Add a configuration item "hive.aux.udf.package.name.list", which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. > Such as, > {code:java} > > hive.aux.udf.package.name.list > com.sample.udf,com.test.udf > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Description: Add a configuration item "hive.aux.udf.package.name.list", which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. (was: Add a configuration item "hive.aux.udf.package.name.list", which is a scan corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function.) > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Status: Patch Available (was: Open) > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Status: Open (was: Patch Available) > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Attachment: HIVE-16929.1.patch > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Attachment: (was: HIVE-16929.1.patch) > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Description: Add a configuration item "hive.aux.udf.package.name.list", which is a scan corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains the corresponding configuration package name under the class registered as a constant function. > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > > Add a configuration item "hive.aux.udf.package.name.list", which is a scan > corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains > the corresponding configuration package name under the class registered as a > constant function. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException
[ https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16233: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the reviews and additional testing! > llap: Query failed with AllocatorOutOfMemoryException > - > > Key: HIVE-16233 > URL: https://issues.apache.org/jira/browse/HIVE-16233 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, > HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, > HIVE-16233.06.patch, HIVE-16233.07.patch > > > {code} > TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : > attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 15 more > Caused by: java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) > ... 17 more > Caused by: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataRea
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Attachment: HIVE-16929.1.patch > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16929: - Status: Patch Available (was: Open) > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Attachments: HIVE-16929.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16929) User-defined UDF functions can be registered as invariant functions
[ https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin reassigned HIVE-16929: > User-defined UDF functions can be registered as invariant functions > --- > > Key: HIVE-16929 > URL: https://issues.apache.org/jira/browse/HIVE-16929 > Project: Hive > Issue Type: New Feature >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056769#comment-16056769 ] Prasanth Jayachandran commented on HIVE-16927: -- [~sseth] could you please take a look? small patch > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-16927: - Status: Patch Available (was: Open) > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-16927: - Attachment: HIVE-16927.1.patch for now setting it to infinite failures as some nodes can be good and few failures will not bring down good nodes that could be actually running queries. > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16927.1.patch > > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056766#comment-16056766 ] Prasanth Jayachandran commented on HIVE-16927: -- One way to fix this is to set higher threshold for failures. Higher threshold can easily be reached on bigger clusters. If the threshold is set to 20, then 2 failures on 10 nodes will bring down all daemons. Ideally, we want slider to retry failures on a different node. > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16928) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-16928. -- Resolution: Duplicate Dup of HIVE-16927 > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16928 > URL: https://issues.apache.org/jira/browse/HIVE-16928 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-16927: > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16927 > URL: https://issues.apache.org/jira/browse/HIVE-16927 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16928) LLAP: Slider takes down all daemons when some daemons fail repeatedly
[ https://issues.apache.org/jira/browse/HIVE-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-16928: > LLAP: Slider takes down all daemons when some daemons fail repeatedly > - > > Key: HIVE-16928 > URL: https://issues.apache.org/jira/browse/HIVE-16928 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > When some containers fail repeatedly, slider thinks application is in > unstable state which brings down all llap daemons. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator
[ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16761: Attachment: HIVE-16761.01.patch Updated to fix the tests. [~hagleitn] I'm being told you are the expert on this (SMB join with multiple mapworks in the same Tez task. Can you please review? > LLAP IO: SMB joins fail elevator > - > > Key: HIVE-16761 > URL: https://issues.apache.org/jira/browse/HIVE-16761 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-16761.01.patch, HIVE-16761.patch > > > {code} > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 26 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149) > ... 28 more > {code} > {code} > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=500; > select year,quarter,count(*) from transactions_raw_orc_200 a join > customer_accounts_orc_200 b on a.account_id=b.account_id group by > year,quarter; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException
[ https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056751#comment-16056751 ] Hive QA commented on HIVE-16233: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873746/HIVE-16233.07.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5698/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5698/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5698/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873746 - PreCommit-HIVE-Build > llap: Query failed with AllocatorOutOfMemoryException > - > > Key: HIVE-16233 > URL: https://issues.apache.org/jira/browse/HIVE-16233 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, > HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, > HIVE-16233.06.patch, HIVE-16233.07.patch > > > {code} > TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : > attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.All
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Status: Patch Available (was: In Progress) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, > HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Status: In Progress (was: Patch Available) > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, > HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
[ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16589: Attachment: HIVE-16589.0993.patch > Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and > COMPLETE for AVG, VARIANCE > --- > > Key: HIVE-16589 > URL: https://issues.apache.org/jira/browse/HIVE-16589 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, > HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, > HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, > HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, > HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, > HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, > HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, > HIVE-16589.09.patch > > > Allow Complex Types to be vectorized (since HIVE-16207: "Add support for > Complex Types in Fast SerDe" was committed). > Add more classes we vectorize AVG in preparation for fully supporting AVG > GroupBy. In particular, the PARTIAL2 and FINAL groupby modes that take in > the AVG struct as input. And, add the COMPLETE mode that takes in the > Original data and produces the Full Aggregation for completeness, so to speak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Attachment: HIVE-13567.16.patch > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Status: Patch Available (was: Open) > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13567: --- Status: Open (was: Patch Available) > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch, HIVE-13567.16.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056713#comment-16056713 ] Vineet Garg commented on HIVE-16793: I am investigating test failures. Will create it as soon as I have fix for tests. > Scalar sub-query: Scalar safety checks for explicit group-bys > - > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] >
[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2
[ https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056697#comment-16056697 ] Hive QA commented on HIVE-13567: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873743/HIVE-13567.15.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5697/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5697/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5697/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-06-21 00:00:29.192 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5697/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-06-21 00:00:29.195 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 8c5f55e..4d141c1 master -> origin/master + git reset --hard HEAD HEAD is now at 8c5f55e HIVE-16797: Enhance HiveFilterSetOpTransposeRule to remove union branches (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d Removing itests/src/test/resources/testconfiguration.properties.orig Removing ql/src/test/queries/clientpositive/explaindenpendencydiffengs.q Removing ql/src/test/results/clientpositive/explaindenpendencydiffengs.q.out Removing ql/src/test/results/clientpositive/spark/explaindenpendencydiffengs.q.out + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 4d141c1 HIVE-16731: Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize (Teddy Choi, reviwed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-06-21 00:00:35.017 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch fatal: git apply: bad git-diff - inconsistent old filename on line 1506 The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12873743 - PreCommit-HIVE-Build > Auto-gather column stats - phase 2 > -- > > Key: HIVE-13567 > URL: https://issues.apache.org/jira/browse/HIVE-13567 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, > HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, > HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, > HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, > HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, > HIVE-13567.15.patch > > > in phase 2, we are going to set auto-gather column on as default. This needs > to update golden files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.
[ https://issues.apache.org/jira/browse/HIVE-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056694#comment-16056694 ] Hive QA commented on HIVE-16875: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873738/HIVE-16875.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=238) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=238) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=150) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=233) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] (batchId=125) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=217) org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges (batchId=220) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5696/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5696/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5696/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873738 - PreCommit-HIVE-Build > Query against view with partitioned child on HoS fails with privilege > exception. > > > Key: HIVE-16875 > URL: https://issues.apache.org/jira/browse/HIVE-16875 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-16875.1.patch, HIVE-16875.2.patch, > HIVE-16875.3.patch > > > Query against view with child table that has partitions fails with privilege > exception even with correct privileges. > Reproduce: > {noformat} > create table jsamp1 (a string) partitioned by (b int); > insert into table jsamp1 partition (b=1) values ("hello"); > create view jview as select * from jsamp1; > create role viewtester; > grant all on table jview to role viewtester; > grant role viewtester to group testers; > Use MR, the select will succeed: > set hive.execution.engine=mr; > select count(*) from jview; > while use spark: > set hive.execution.engine=spark; > select count(*) from jview; > it fails with: > Error: Error while compiling statement: FAILED: SemanticException No valid > privileges > User tester does not have privileges for QUERY > The required privileges: > Server=server1->Db=default->Table=j1part->action=select; > (state=42000,code=4) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere reassigned HIVE-16926: - > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056678#comment-16056678 ] Ashutosh Chauhan commented on HIVE-16793: - Can you create a RB for this ? > Scalar sub-query: Scalar safety checks for explicit group-bys > - > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] > <-Select Operator [SEL_59] (rows=200
[jira] [Commented] (HIVE-16920) remove useless uri.getScheme() from EximUtil
[ https://issues.apache.org/jira/browse/HIVE-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056653#comment-16056653 ] Ferdinand Xu commented on HIVE-16920: - +1 LGTM > remove useless uri.getScheme() from EximUtil > > > Key: HIVE-16920 > URL: https://issues.apache.org/jira/browse/HIVE-16920 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0 >Reporter: Fei Hui >Assignee: Fei Hui > Attachments: HIVE-16920.patch > > > {code:title=EximUtil.java|borderStyle=solid} > static URI getValidatedURI(HiveConf conf, String dcPath) throws > SemanticException { > try { > boolean testMode = conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODE); > URI uri = new Path(dcPath).toUri(); > String scheme = uri.getScheme(); > String authority = uri.getAuthority(); > String path = uri.getPath(); > FileSystem fs = FileSystem.get(uri, conf); > LOG.info("Path before norm :" + path); > // generate absolute path relative to home directory > if (!path.startsWith("/")) { > if (testMode) { > path = (new Path(System.getProperty("test.tmp.dir"), > path)).toUri().getPath(); > } else { > path = > (new Path(new Path("/user/" + System.getProperty("user.name")), > path)).toUri() > .getPath(); > } > } > // Get scheme from FileSystem > scheme = fs.getScheme(); > ... > } > {code} > We found that {{String scheme = uri.getScheme();}} is useless, we can remove > it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize
[ https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16731: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master. > Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null > end" that involves a column name or expression THEN or ELSE vectorize > --- > > Key: HIVE-16731 > URL: https://issues.apache.org/jira/browse/HIVE-16731 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, > HIVE-16731.3.patch, HIVE-16731.4.patch > > > Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize
[ https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16731: Fix Version/s: 3.0.0 > Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null > end" that involves a column name or expression THEN or ELSE vectorize > --- > > Key: HIVE-16731 > URL: https://issues.apache.org/jira/browse/HIVE-16731 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, > HIVE-16731.3.patch, HIVE-16731.4.patch > > > Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize
[ https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-16731: --- Assignee: Teddy Choi (was: Matt McCline) > Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null > end" that involves a column name or expression THEN or ELSE vectorize > --- > > Key: HIVE-16731 > URL: https://issues.apache.org/jira/browse/HIVE-16731 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, > HIVE-16731.3.patch, HIVE-16731.4.patch > > > Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize
[ https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056642#comment-16056642 ] Matt McCline commented on HIVE-16731: - Ok, looks good. > Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null > end" that involves a column name or expression THEN or ELSE vectorize > --- > > Key: HIVE-16731 > URL: https://issues.apache.org/jira/browse/HIVE-16731 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, > HIVE-16731.3.patch, HIVE-16731.4.patch > > > Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16417) Introduce Service-client module
[ https://issues.apache.org/jira/browse/HIVE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056628#comment-16056628 ] Vaibhav Gumashta commented on HIVE-16417: - +1 > Introduce Service-client module > --- > > Key: HIVE-16417 > URL: https://issues.apache.org/jira/browse/HIVE-16417 > Project: Hive > Issue Type: Sub-task > Components: Metastore, Server Infrastructure >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Attachments: HIVE-16417.1.patch > > > Moving the relevant classes out from service, enables the jdbc driver to > relax its dependencies to only use {{service-client}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize
[ https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056625#comment-16056625 ] Hive QA commented on HIVE-16731: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873734/HIVE-16731.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10822 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=100) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5695/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5695/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5695/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873734 - PreCommit-HIVE-Build > Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null > end" that involves a column name or expression THEN or ELSE vectorize > --- > > Key: HIVE-16731 > URL: https://issues.apache.org/jira/browse/HIVE-16731 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, > HIVE-16731.3.patch, HIVE-16731.4.patch > > > Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables
[ https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056621#comment-16056621 ] Wei Zheng commented on HIVE-14988: -- patch 03 is following the "new base" approach proposed by Eugene. For example, we have such directory layout: {code} delta_1_1 delta_2_2 base_2 delta_3 {code} After an Insert Overwrite, it should become like this: {code} delta_1_1 delta_2_2 base_2 delta_3 base_4 <= new base. All other dirs become obsolete. {code} > Support INSERT OVERWRITE into a partition on transactional tables > - > > Key: HIVE-14988 > URL: https://issues.apache.org/jira/browse/HIVE-14988 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, > HIVE-14988.03.patch > > > Insert overwrite operation on transactional table will currently raise an > error. > This can/should be supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16925) isSlowStart lost during refactoring
[ https://issues.apache.org/jira/browse/HIVE-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056620#comment-16056620 ] ASF GitHub Bot commented on HIVE-16925: --- GitHub user dosoft opened a pull request: https://github.com/apache/hive/pull/195 HIVE-16925: Add isSlowStart as parameter for the setAutoReduce method You can merge this pull request into a Git repository by running: $ git pull https://github.com/dosoft/hive HIVE-16925 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/195.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #195 commit 9276b330d17a2c21d4ec6e1bc31bb6871429ec0e Author: Oleg Danilov Date: 2017-06-20T22:50:14Z HIVE-16925: Add isSlowStart as parameter for the setAutoReduce method > isSlowStart lost during refactoring > --- > > Key: HIVE-16925 > URL: https://issues.apache.org/jira/browse/HIVE-16925 > Project: Hive > Issue Type: Bug >Reporter: Oleg Danilov >Priority: Minor > > TezEdgeProperty.setAutoReduce() should have isSlowStart as parameter -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16772) Support TPCDS query11.q in PerfCliDriver
[ https://issues.apache.org/jira/browse/HIVE-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-16772. Resolution: Fixed > Support TPCDS query11.q in PerfCliDriver > > > Key: HIVE-16772 > URL: https://issues.apache.org/jira/browse/HIVE-16772 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > {code} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 54:22 Invalid column > reference 'customer_preferred_cust_flag' > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11744) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11692) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables
[ https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-14988: - Status: Patch Available (was: Open) > Support INSERT OVERWRITE into a partition on transactional tables > - > > Key: HIVE-14988 > URL: https://issues.apache.org/jira/browse/HIVE-14988 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, > HIVE-14988.03.patch > > > Insert overwrite operation on transactional table will currently raise an > error. > This can/should be supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables
[ https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng reassigned HIVE-14988: Assignee: Wei Zheng (was: Eugene Koifman) > Support INSERT OVERWRITE into a partition on transactional tables > - > > Key: HIVE-14988 > URL: https://issues.apache.org/jira/browse/HIVE-14988 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, > HIVE-14988.03.patch > > > Insert overwrite operation on transactional table will currently raise an > error. > This can/should be supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables
[ https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-14988: - Attachment: HIVE-14988.03.patch > Support INSERT OVERWRITE into a partition on transactional tables > - > > Key: HIVE-14988 > URL: https://issues.apache.org/jira/browse/HIVE-14988 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, > HIVE-14988.03.patch > > > Insert overwrite operation on transactional table will currently raise an > error. > This can/should be supported -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16924) Support distinct in presence Gby
[ https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056595#comment-16056595 ] Ashutosh Chauhan commented on HIVE-16924: - In general case this needs two Gby. First to compute group by and aggregates and second to do distinct with Gby keys as all columns in select list. Queries in example provided can actually be computed by single Gby, but thats an optimization which potentially can be done in a follow-up. > Support distinct in presence Gby > - > > Key: HIVE-16924 > URL: https://issues.apache.org/jira/browse/HIVE-16924 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Reporter: Carter Shanklin > > create table e011_01 (c1 int, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > These queries should work: > select distinct c1, count(*) from e011_01 group by c1; > select distinct c1, avg(c2) from e011_01 group by c1; > Currently, you get : > FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the > same query. Error encountered near token 'c1' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16924) Support distinct in presence Gby
[ https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056596#comment-16056596 ] Ashutosh Chauhan commented on HIVE-16924: - cc: [~rusanu] > Support distinct in presence Gby > - > > Key: HIVE-16924 > URL: https://issues.apache.org/jira/browse/HIVE-16924 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Reporter: Carter Shanklin > > create table e011_01 (c1 int, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > These queries should work: > select distinct c1, count(*) from e011_01 group by c1; > select distinct c1, avg(c2) from e011_01 group by c1; > Currently, you get : > FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the > same query. Error encountered near token 'c1' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16840: Attachment: HIVE-16840.patch > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS
[ https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056593#comment-16056593 ] liyunzhang_intel commented on HIVE-16840: - [~xuefuz],[~lirui],[~Ferd],[~csun]: attached is HIVE-16840.1.patch. Changes 1. change physical plan at SetSparkReducerParallelism#process. If needSetSparkReucerParallelism return false,this stands for current sink maybe an order by limit case. Add newSparkSortRS(actually its type is ReduceSink), newSel(actually its type is Sel),newLimit(actually its type is Limit) before sink. original physical plan is {code} ...-RS-SEL-LIMIT{code} now physical plan is {code} ...-newSparkSortRS-newSel-newLimit-RS-LIMIT{code} currently i add SetSparkReducerParallelism#getNumReducerForSparkSortRS, it returns 10,this set the parallelism for newSparkSortRS. i will update the function in next patch. 2. add a property sortLimit in ReduceSinkOperator. If it is true. use partition sort not global sort in GenSparkUtils#getEdgeProperty Not fully test about the patch, just test a simple qfile, but I think we can parallel, you can review, i will start fully test. {code} select key,value from src order by key limit 10; {code} > Investigate the performance of order by limit in HoS > > > Key: HIVE-16840 > URL: https://issues.apache.org/jira/browse/HIVE-16840 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16840.patch > > > We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged. > {code} > select i_item_id >,i_item_desc >,s_state >,count(ss_quantity) as store_sales_quantitycount >,avg(ss_quantity) as store_sales_quantityave >,stddev_samp(ss_quantity) as store_sales_quantitystdev >,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov >,count(sr_return_quantity) as_store_returns_quantitycount >,avg(sr_return_quantity) as_store_returns_quantityave >,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev >,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov >,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) > as catalog_sales_quantityave >,stddev_samp(cs_quantity)/avg(cs_quantity) as > catalog_sales_quantitystdev >,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov > from store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = store_sales.ss_sold_date_sk >and item.i_item_sk = store_sales.ss_item_sk >and store.s_store_sk = store_sales.ss_store_sk >and store_sales.ss_customer_sk = store_returns.sr_customer_sk >and store_sales.ss_item_sk = store_returns.sr_item_sk >and store_sales.ss_ticket_number = store_returns.sr_ticket_number >and store_returns.sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') >and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk >and store_returns.sr_item_sk = catalog_sales.cs_item_sk >and catalog_sales.cs_sold_date_sk = d3.d_date_sk >and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') > group by i_item_id > ,i_item_desc > ,s_state > order by i_item_id > ,i_item_desc > ,s_state > limit 100; > {code} > the reason why the script hanged is because we only use 1 task to implement > sort. > {code} > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 10 <- Reducer 9 (SORT, 1) > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 > (PARTITION-LEVEL SORT, 889) > Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 > (PARTITION-LEVEL SORT, 1009) > Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 > (PARTITION-LEVEL SORT, 683) > Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 > (PARTITION-LEVEL SORT, 751) > Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 > (PARTITION-LEVEL SORT, 826) > Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 > (PARTITION-LEVEL SORT, 909) > Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 > (PARTITION-LEVEL SORT, 1001) > Reducer 9 <- Reducer 8 (GROUP, 2) > {code} > The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 > task to execute to ensure the correctness. But the performance is poor. > the reason why we use 1 task to implement order by limit is > [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSpark
[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056577#comment-16056577 ] Chao Sun commented on HIVE-11297: - Sorry for the late response. Will put comments in the RB. Regarding the filterOp issue, It's a little strange since I'm seeing something different on my side (with the latest master branch). For the query you posted above, I saw: {code} TS[3] -> FIL[18] -> SEL[5] -> SEL[19] -> GBY[20] -> SPARKPRUNINGSINK[21] TS[3] -> FIL[18] -> SEL[5] -> SEL[22] -> GBY[23] -> SPARKPRUNINGSINK[24] TS[3] -> FIL[18] -> SEL[5] -> RS[7] -> JOIN[8] -> ... {code} inside {{SplitOpTreeForDPP}}. > Combine op trees for partition info generating tasks [Spark branch] > --- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Chao Sun >Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16923) Hive-on-Spark DPP Improvements
[ https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056570#comment-16056570 ] Sahil Takiar edited comment on HIVE-16923 at 6/20/17 10:19 PM: --- Will post a design doc soon. Two of the biggest limitations of the current DPP implementation are that it requires an additional Spark job and it requires writing some intermediate data to HDFS. We should evaluate the overhead of these limitations and if its possible to remove them. Ideally, DPP shouldn't hurt performance for any query. One way to ensure this is to build some type of cost-based model that predicts whether or not DPP will help perf or not. For example, a simple cost-based model could simply enable DPP for map-joins only. Since map-joins already require two Spark jobs and writing intermediate data to HDFS, there shouldn't be significant overhead to running DPP with a map-join. was (Author: stakiar): Will post a design doc soon. Two of the biggest limitations of the current DPP implementation are that it requires an additional Spark job and it requires writing some intermediate data to HDFS. Ideally, DPP shouldn't hurt performance for any query. One way to ensure this is to build some type of cost-based model that predicts whether or not DPP will help perf or not. For example, a simple cost-based model could simply enable DPP for map-joins only. Since map-joins already require two Spark jobs and writing intermediate data to HDFS, there shouldn't be significant overhead to running DPP with a map-join. > Hive-on-Spark DPP Improvements > -- > > Key: HIVE-16923 > URL: https://issues.apache.org/jira/browse/HIVE-16923 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > Improvements to Hive-on-Spark DPP so that it is production ready. > Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by > default. The goal of this JIRA is to improve the DPP implementation so that > it can be enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16923) Hive-on-Spark DPP Improvements
[ https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056570#comment-16056570 ] Sahil Takiar commented on HIVE-16923: - Will post a design doc soon. Two of the biggest limitations of the current DPP implementation are that it requires an additional Spark job and it requires writing some intermediate data to HDFS. Ideally, DPP shouldn't hurt performance for any query. One way to ensure this is to build some type of cost-based model that predicts whether or not DPP will help perf or not. For example, a simple cost-based model could simply enable DPP for map-joins only. Since map-joins already require two Spark jobs and writing intermediate data to HDFS, there shouldn't be significant overhead to running DPP with a map-join. > Hive-on-Spark DPP Improvements > -- > > Key: HIVE-16923 > URL: https://issues.apache.org/jira/browse/HIVE-16923 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > Improvements to Hive-on-Spark DPP so that it is production ready. > Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by > default. The goal of this JIRA is to improve the DPP implementation so that > it can be enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16923) Hive-on-Spark DPP Improvements
[ https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-16923: --- > Hive-on-Spark DPP Improvements > -- > > Key: HIVE-16923 > URL: https://issues.apache.org/jira/browse/HIVE-16923 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > Improvements to Hive-on-Spark DPP so that it is production ready. > Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by > default. The goal of this JIRA is to improve the DPP implementation so that > it can be enabled by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator
[ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056561#comment-16056561 ] Sergey Shelukhin commented on HIVE-16761: - Test failures are related, due to some bogus error. > LLAP IO: SMB joins fail elevator > - > > Key: HIVE-16761 > URL: https://issues.apache.org/jira/browse/HIVE-16761 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-16761.patch > > > {code} > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 26 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149) > ... 28 more > {code} > {code} > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=500; > select year,quarter,count(*) from transactions_raw_orc_200 a join > customer_accounts_orc_200 b on a.account_id=b.account_id group by > year,quarter; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException
[ https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16233: Attachment: HIVE-16233.07.patch > llap: Query failed with AllocatorOutOfMemoryException > - > > Key: HIVE-16233 > URL: https://issues.apache.org/jira/browse/HIVE-16233 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, > HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, > HIVE-16233.06.patch, HIVE-16233.07.patch > > > {code} > TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : > attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 15 more > Caused by: java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) > ... 17 more > Caused by: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:235) >
[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator
[ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056552#comment-16056552 ] Hive QA commented on HIVE-16761: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12873725/HIVE-16761.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10822 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=71) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_all] (batchId=57) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=101) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=216) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=216) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5694/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5694/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5694/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12873725 - PreCommit-HIVE-Build > LLAP IO: SMB joins fail elevator > - > > Key: HIVE-16761 > URL: https://issues.apache.org/jira/browse/HIVE-16761 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-16761.patch > > > {code} > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 26 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149) > ... 28 more > {code} > {code} > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=500; > select year,quarter,count(*) from transactions_raw_orc_200 a join > customer_accounts_orc_200 b on a.account_id=b.account_id group by > year,quarter; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException
[ https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16233: Attachment: (was: HIVE-16233.07.patch) > llap: Query failed with AllocatorOutOfMemoryException > - > > Key: HIVE-16233 > URL: https://issues.apache.org/jira/browse/HIVE-16233 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, > HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, > HIVE-16233.06.patch > > > {code} > TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : > attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 15 more > Caused by: java.io.IOException: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) > ... 17 more > Caused by: java.io.IOException: > org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: > Failed to allocate 262144; at 0 out of 1 > at > org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:235) > at