date:20170620

[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-20 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16885:
---
Attachment: HIVE-16885.03.patch

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.02.patch, 
> HIVE-16885.03.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator [SEL_22] (rows=2 width=621)
> 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=2 width=621)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: HIVE-16589.0994.patch

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: (was: HIVE-16589.0994.patch)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, 
> HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057052#comment-16057052
 ] 

Hive QA commented on HIVE-16929:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873771/HIVE-16929.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=220)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5705/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5705/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5705/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873771 - PreCommit-HIVE-Build

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp

2017-06-20 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057031#comment-16057031
 ] 

anishek commented on HIVE-16918:


I think HIVE_IN_TEST is definitely better indicator. However I was thinking 
that any method on pfile implementation should first check for "HIVE_IN_TEST" 
and everywhere else we can just do the pfile/file scheme check. this way we 
wont be using HIVE_IN_TEST in various classes as it will be limited to only the 
proxyfilesystem class and we use the pfile as a regular scheme everywhere. What 
do you think ?

> Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
> --
>
> Key: HIVE-16918
> URL: https://issues.apache.org/jira/browse/HIVE-16918
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 3.0.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-16918.2.patch, HIVE-16918.patch
>
>
> With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
> This, however, is incorrect for copying _metadata generated from a temporary 
> scratch directory to hdfs. We need to change that so that routes to using a 
> regular CopyTask. The issue with using distcp for this is that distcp 
> launches from another job which may be queued on another machine, which does 
> not have access to this file:// uri. Distcp should only ever be used when 
> copying from non-localfilesystems.
> Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
> default for invocations of distcp from hive. Adding that in. This would not 
> be necessary if HADOOP-8143 had made it in, but till it doesn't go in, we 
> need it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057012#comment-16057012
 ] 

Hive QA commented on HIVE-16929:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873771/HIVE-16929.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5704/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5704/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5704/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873771 - PreCommit-HIVE-Build

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057007#comment-16057007
 ] 

Rui Li commented on HIVE-16840:
---

Besides, it's better to add a separate optimizer for this optimization. 
SetSparkReducerParallelism is only intended to set parallelism for RSes.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: HIVE-16589.0994.patch

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: (was: HIVE-16589.0994.patch)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: (was: HIVE-13567.16.patch)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.16.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task

2017-06-20 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16893 started by anishek.
--
> move replication dump related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16893
> URL: https://issues.apache.org/jira/browse/HIVE-16893
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Since we run in to the possibility of creating a large number tasks during 
> replication bootstrap dump
> * we may not be able to hold all of them in memory for really large 
> databases, which might not hold true once we complete HIVE-16892
> * Also a compile time lock is taken such that only one query is run in this 
> phase which in replication bootstrap scenario is going to be a very long 
> running task and hence moving it to execution phase will limit the lock 
> period in compile phase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?

2017-06-20 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056981#comment-16056981
 ] 

Matt McCline commented on HIVE-16919:
-

First one:

4 back: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),

11 back in MIN(cint) -1069736047

1st is MAX(cint) -2030


> Vectorization: vectorization_short_regress.q has query result differences 
> with non-vectorized run.  Vectorized unary function broken?
> -
>
> Key: HIVE-16919
> URL: https://issues.apache.org/jira/browse/HIVE-16919
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> Jason spotted a difference in the query result for 
> vectorization_short_regress.q.out -- that is when vectorization is turned off 
> and a base .q.out file created, there are 2 differences.
> They both seem to be related to negation.  For example, in the first one 
> MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec.  
> So, it doesn't appear that aggregation is failing.  It seems like the issue 
> is now that the Reducer is vectorizing, a bug is exposed.  So, even though 
> MAX and MIN are the same, the expression with negation returns different 
> results.
> 19th field of the query below: Vectorized 511 vs Non-Vectorized -58
> {noformat}
> SELECT MAX(cint),
>(MAX(cint) / -3728),
>(MAX(cint) * -3728),
>VAR_POP(cbigint),
>(-((MAX(cint) * -3728))),
>STDDEV_POP(csmallint),
>(-563 % (MAX(cint) * -3728)),
>(VAR_POP(cbigint) / STDDEV_POP(csmallint)),
>(-(STDDEV_POP(csmallint))),
>MAX(cdouble),
>AVG(ctinyint),
>(STDDEV_POP(csmallint) - 10.175),
>MIN(cint),
>((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
>(-(MAX(cdouble))),
>MIN(cdouble),
>(MAX(cdouble) % -26.28),
>STDDEV_SAMP(csmallint),
>(-((MAX(cint) / -3728))),
>((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
>((MAX(cint) / -3728) - AVG(ctinyint)),
>(-((MAX(cint) * -3728))),
>VAR_SAMP(cint)
> FROM   alltypesorc
> WHERE  (((cbigint <= 197)
>  AND (cint < cbigint))
> OR ((cdouble >= -26.28)
> AND (csmallint > cdouble))
> OR ((ctinyint > cfloat)
> AND (cstring1 RLIKE '.*ss.*'))
>OR ((cfloat > 79.553)
>AND (cstring2 LIKE '10%')))
> {noformat}
> Column expression is:  ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * 
> -3728))),
> ---
> This is a previously existing issue and now filed as  HIVE-16919: 
> "Vectorization: vectorization_short_regress.q has query result differences 
> with non-vectorized run"
> 10th field of the query below: Non-Vectorized -6432.15344526 vs. 
> -Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> Query result for vectorization_short_regress.q.out -- that is when 
> vectorization is turned off and a base .q.out file created.
> ---
> 10th field of the query below: Non-Vectorized -6432.15344526 vs. 
> Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT   ctimestamp1,
>  cstring2,
>  cdouble,
>  cfloat,
>  cbigint,
>  csmallint,
>  (cbigint / 3569) as c1,
>  (-257 - csmallint) as c2,
>  (-6432 * cfloat) as c3,
>  (-(cdouble)) as c4,
>  (cdouble * 10.175) as c5,
>  ((-6432 * cfloat) / cfloat) as c6,
>  (-(cfloat)) as c7,
>  (cint % csmallint) as c8,
>  (-(cdouble)) as c9,
>  (cdouble * (-(cdouble))) as c10
> FROM alltypesorc
> WHERE(((-1.389 >= cint)
>AND ((csmallint < ctinyint)
> AND (-6432 > csmallint)))
>   OR ((cdouble >= cfloat)
>   AND (cstring2 <= 'a'))
>  OR ((cstring1 LIKE 'ss%')
>  AND (10.175 > cbigint)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-20 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16892:
---
Status: Patch Available  (was: In Progress)

> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16892.1.patch
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-20 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16892:
---
Attachment: HIVE-16892.1.patch

> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16892.1.patch
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056979#comment-16056979
 ] 

ASF GitHub Bot commented on HIVE-16892:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/196

HIVE-16892 : Move creation of _files from ReplCopyTask to analysis phase 
for boostrap replication

the export semantic analyzer is still using the inputs/outputs since that 
should be used by repl v1 and hence authorization there might be required 
outside of what is done in repl v2  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-16892

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit e1a16e918e7809a7c6f6a1fd004e7f9798ff7c79
Author: Anishek Agarwal 
Date:   2017-06-21T04:34:12Z

HIVE-16892 : Move creation of _files from ReplCopyTask to analysis phase 
for boostrap replication




> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056978#comment-16056978
 ] 

Matt McCline commented on HIVE-16589:
-

Ok, I've convinced myself that Vectorized 511 vs Non-Vectorized -58 
vectorization_short_regress.q issue is not aggregation related but an old bug.  
It is now part of HIVE-16919.

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16919:

Description: 
Jason spotted a difference in the query result for 
vectorization_short_regress.q.out -- that is when vectorization is turned off 
and a base .q.out file created, there are 2 differences.

They both seem to be related to negation.  For example, in the first one 
MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec.  
So, it doesn't appear that aggregation is failing.  It seems like the issue is 
now that the Reducer is vectorizing, a bug is exposed.  So, even though MAX and 
MIN are the same, the expression with negation returns different results.

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
   (MAX(cint) / -3728),
   (MAX(cint) * -3728),
   VAR_POP(cbigint),
   (-((MAX(cint) * -3728))),
   STDDEV_POP(csmallint),
   (-563 % (MAX(cint) * -3728)),
   (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
   (-(STDDEV_POP(csmallint))),
   MAX(cdouble),
   AVG(ctinyint),
   (STDDEV_POP(csmallint) - 10.175),
   MIN(cint),
   ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
   (-(MAX(cdouble))),
   MIN(cdouble),
   (MAX(cdouble) % -26.28),
   STDDEV_SAMP(csmallint),
   (-((MAX(cint) / -3728))),
   ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
   ((MAX(cint) / -3728) - AVG(ctinyint)),
   (-((MAX(cint) * -3728))),
   VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
 AND (cint < cbigint))
OR ((cdouble >= -26.28)
AND (csmallint > cdouble))
OR ((ctinyint > cfloat)
AND (cstring1 RLIKE '.*ss.*'))
   OR ((cfloat > 79.553)
   AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * 
-3728))),

---

This is a previously existing issue and now filed as  HIVE-16919: 
"Vectorization: vectorization_short_regress.q has query result differences with 
non-vectorized run"
10th field of the query below: Non-Vectorized -6432.15344526 vs. 
-Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

Query result for vectorization_short_regress.q.out -- that is when 
vectorization is turned off and a base .q.out file created.

---

10th field of the query below: Non-Vectorized -6432.15344526 vs. Vectorized 
-6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
 cstring2,
 cdouble,
 cfloat,
 cbigint,
 csmallint,
 (cbigint / 3569) as c1,
 (-257 - csmallint) as c2,
 (-6432 * cfloat) as c3,
 (-(cdouble)) as c4,
 (cdouble * 10.175) as c5,
 ((-6432 * cfloat) / cfloat) as c6,
 (-(cfloat)) as c7,
 (cint % csmallint) as c8,
 (-(cdouble)) as c9,
 (cdouble * (-(cdouble))) as c10
FROM alltypesorc
WHERE(((-1.389 >= cint)
   AND ((csmallint < ctinyint)
AND (-6432 > csmallint)))
  OR ((cdouble >= cfloat)
  AND (cstring2 <= 'a'))
 OR ((cstring1 LIKE 'ss%')
 AND (10.175 > cbigint)))
{noformat}

  was:
Query result for vectorization_short_regress.q.out -- that is when 
vectorization is turned off and a base .q.out file created.

---

10th field of the query below: Non-Vectorized -6432.15344526 vs. Vectorized 
-6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
 cstring2,
 cdouble,
 cfloat,
 cbigint,
 csmallint,
 (cbigint / 3569) as c1,
 (-257 - csmallint) as c2,
 (-6432 * cfloat) as c3,
 (-(cdouble)) as c4,
 (cdouble * 10.175) as c5,
 ((-6432 * cfloat) / cfloat) as c6,
 (-(cfloat)) as c7,
 (cint % csmallint) as c8,
 (-(cdouble)) as c9,
 (cdouble * (-(cdouble))) as c10
FROM alltypesorc
WHERE(((-1.389 >= cint)
   AND ((csmallint < ctinyint)
AND (-6432 > csmallint)))
  OR ((cdouble >= cfloat)
  AND (cstring2 <= 'a'))
 OR ((cstring1 LIKE 'ss%')
 AND (10.175 > cbigint)))
{noformat}


> Vectorization: vectorization_short_regress.q has query result differences 
> with non-vectorized run.  Vectorized unary function broken?
> -
>
> Key: HIVE-16919
> URL: https://issues.apache.org/jira/browse/HIVE-16919
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>

[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056976#comment-16056976
 ] 

Hive QA commented on HIVE-16927:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873767/HIVE-16927.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5703/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5703/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5703/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873767 - PreCommit-HIVE-Build

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16919:

Summary: Vectorization: vectorization_short_regress.q has query result 
differences with non-vectorized run.  Vectorized unary function broken?  (was: 
Vectorization: vectorization_short_regress.q has query result differences with 
non-vectorized run.)

> Vectorization: vectorization_short_regress.q has query result differences 
> with non-vectorized run.  Vectorized unary function broken?
> -
>
> Key: HIVE-16919
> URL: https://issues.apache.org/jira/browse/HIVE-16919
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> Query result for vectorization_short_regress.q.out -- that is when 
> vectorization is turned off and a base .q.out file created.
> ---
> 10th field of the query below: Non-Vectorized -6432.15344526 vs. 
> Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT   ctimestamp1,
>  cstring2,
>  cdouble,
>  cfloat,
>  cbigint,
>  csmallint,
>  (cbigint / 3569) as c1,
>  (-257 - csmallint) as c2,
>  (-6432 * cfloat) as c3,
>  (-(cdouble)) as c4,
>  (cdouble * 10.175) as c5,
>  ((-6432 * cfloat) / cfloat) as c6,
>  (-(cfloat)) as c7,
>  (cint % csmallint) as c8,
>  (-(cdouble)) as c9,
>  (cdouble * (-(cdouble))) as c10
> FROM alltypesorc
> WHERE(((-1.389 >= cint)
>AND ((csmallint < ctinyint)
> AND (-6432 > csmallint)))
>   OR ((cdouble >= cfloat)
>   AND (cstring2 <= 'a'))
>  OR ((cstring1 LIKE 'ss%')
>  AND (10.175 > cbigint)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load

2017-06-20 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek resolved HIVE-16900.

Resolution: Duplicate

> optimization to give distcp a list of input files to copy to a destination 
> target directory during repl load
> 
>
> Key: HIVE-16900
> URL: https://issues.apache.org/jira/browse/HIVE-16900
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> During repl Copy currently we only allow operations per file as against list 
> of files supported by distcp, During bootstrap table/partitions load it will 
> be great to load all files listed in {noformat}_files{noformat} in a single 
> distcp job to make it more efficient, this would require changes to the 
> _shims_ sub project in hive to additionally expose api's which take multiple 
> source files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056931#comment-16056931
 ] 

Hive QA commented on HIVE-16761:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873764/HIVE-16761.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5702/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5702/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5702/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873764 - PreCommit-HIVE-Build

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056928#comment-16056928
 ] 

Rui Li commented on HIVE-16840:
---

bq. you mean that if the limit number is too large...
Yeah. But it's a little tricky to set a proper upper bound for it. How about we 
do something like this: if statistics is available, we can estimate the number 
of rows in the input of the RS. If the limit number is, say, >= 90% of the 
rows, we can skip the optimization. If statistics is unavailable, we run the 
optimization anyway.
You can find how we estimate num of bytes in SetSparkReducerParallelism. Guess 
we can estimate num of rows similarly.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-20 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056908#comment-16056908
 ] 

Chao Sun commented on HIVE-11297:
-

{quote}
So can you retest it in your env? if the operator tree is like what you 
mentioned, i think all the operator tree in 
spark_dynamic_partition_pruning.q.out will be different as i generated in my 
env.
{quote}

Interesting.. I'm not sure what caused the difference, may be some 
configurations? I've tried several times in my env and the FIL is always 
followed by a SEL operator. Nevertheless, this is not an important issue. Will 
take a look a the RB.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: HIVE-16589.0994.patch

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Status: In Progress  (was: Patch Available)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Status: Patch Available  (was: In Progress)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.0994.patch, 
> HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-06-20 Thread Sunitha Beeram (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056902#comment-16056902
 ] 

Sunitha Beeram commented on HIVE-16844:
---

[~sankarh] I am running into some issue fixing the unit tests and wondering if 
you have some input. I tried using an approach similar to what you did to fix 
the failures in TestReplicationScenariosAcrossInstances : ie, use the same 
configuration, but a different source and destination db name. However, the 
serialize and deserialize methods encode/decode the db and table names. I was 
able to work around them somewhat for: 
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (by 
resetting the target dbname via HCatTable interface)
and 
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (by doing a string replace of dbName on the partition-spec string).

But for 
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema,
 I have hit a block; I can't update the dbname through the HCatAddPartitionDesc 
APIs nor HCatPartition APIs. I could add methods to either of these to update 
the dbname (HCatTable allows that), but I am beginning to wonder if this is 
right approach. 

Running multiple instances of Metastore within the same JVM is probably error 
prone as there could be other static variables in classes that might have 
unintended sharing, similar to how it has been an issue with this tests that 
this change broke. Are we better off handling these tests via integration tests 
and not unit tests? The other option might be to mock out the db completely.

Let me know if you have further input on this. Thanks!

> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056883#comment-16056883
 ] 

Hive QA commented on HIVE-16589:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873760/HIVE-16589.0993.patch

{color:green}SUCCESS:{color} +1 due to 29 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_distinct_gby] 
(batchId=70)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_adaptor_usage_mode]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_reduce]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5701/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5701/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5701/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873760 - PreCommit-HIVE-Build

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, 
> HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056874#comment-16056874
 ] 

Ashutosh Chauhan commented on HIVE-6348:


I am not sure why can't order by removed in these cases. There is no contract 
that scripts and UDFs will see data in any particular order. So, its perfectly 
alright to remove sorts in such cases.

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056858#comment-16056858
 ] 

liyunzhang_intel edited comment on HIVE-16840 at 6/21/17 2:24 AM:
--

[~lirui]: 
 bq.If so, I wonder whether we should put a limit on the limited number. E.g. 
if the number is too large, we should skip this optimization.
you mean that if the limit number is too large( select * from A order by ColB 
limit 99, when the total records of A is 100), there is no performance 
improvement maybe degradation because now there is 1 extra reduce.
bq.Besides, I don't think we need to add the sortLimit flag to RS. 
ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. 
thanks for suggestion.


was (Author: kellyzly):
[~lirui]: 
 bq.If so, I wonder whether we should put a limit on the limited number. E.g. 
if the number is too large, we should skip this optimization.
you mean that if the limit number is too large( select * from A order by ColB 
limit 100, when the total records of A is 99), there is no performance 
improvement maybe degradation because now there is 1 extra reduce.
bq.Besides, I don't think we need to add the sortLimit flag to RS. 
ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. 
thanks for suggestion.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by li

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056858#comment-16056858
 ] 

liyunzhang_intel commented on HIVE-16840:
-

[~lirui]: 
 bq.If so, I wonder whether we should put a limit on the limited number. E.g. 
if the number is too large, we should skip this optimization.
you mean that if the limit number is too large( select * from A order by ColB 
limit 100, when the total records of A is 99), there is no performance 
improvement maybe degradation because now there is 1 extra reduce.
bq.Besides, I don't think we need to add the sortLimit flag to RS. 
ReduceSinkDesc has a flag hasOrderBy indicating whether global order is needed. 
thanks for suggestion.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056852#comment-16056852
 ] 

Rui Li commented on HIVE-16840:
---

To clarify, the idea is to introduce an extra MR shuffle and push the limit to 
it right? If so, I wonder whether we should put a limit on the limited number. 
E.g. if the number is too large, we should skip this optimization.
Besides, I don't think we need to add the sortLimit flag to RS. ReduceSinkDesc 
has a flag hasOrderBy indicating whether global order is needed. We can set 
that to false for the new RS and GenSparkUtils#getEdgeProperty should give us 
the MR shuffle.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-20 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Attachment: HIVE-11297.7.patch

[~csun]:  update HIVE-11297.7.patch according to the last round of review in 
review board.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-20 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837
 ] 

liyunzhang_intel edited comment on HIVE-11297 at 6/21/17 2:09 AM:
--

[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run 
query i posted above, i print the operator tree  

SplitOpTreeForDPP#process
{code}
.
/** print the operator tree **/
  ArrayList tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator filterOp = pruningSinkOp;
while (filterOp != null) {
  if (filterOp.getNumChild() > 1) {
break;
  } else {
filterOp = filterOp.getParentOperators().get(0);
  }
}


{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}

So can you retest it in your env? if the operator tree is like what you 
mentioned, i think all the operator tree in 
spark_dynamic_partition_pruning.q.out will be different as i generated in my 
env.



was (Author: kellyzly):
[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run 
query i posted above, i print the operator tree of filterOp 

SplitOpTreeForDPP#process
{code}
.
/** print the operator tree **/
  ArrayList tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator filterOp = pruningSinkOp;
while (filterOp != null) {
  if (filterOp.getNumChild() > 1) {
break;
  } else {
filterOp = filterOp.getParentOperators().get(0);
  }
}


{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-20 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837
 ] 

liyunzhang_intel commented on HIVE-11297:
-

[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run 
query i posted above, i print the operator tree of filterOp 

SplitOpTreeForDPP#process
{code}
.
/** print the operator tree **/
  ArrayList tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator filterOp = pruningSinkOp;
while (filterOp != null) {
  if (filterOp.getNumChild() > 1) {
break;
  } else {
filterOp = filterOp.getParentOperators().get(0);
  }
}


{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056817#comment-16056817
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873759/HIVE-13567.16.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5700/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5700/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5700/

Messages:
{noformat}
 This message was trimmed, see log for full details 
patching file ql/src/test/results/clientpositive/spark/stats_only_null.q.out
patching file ql/src/test/results/clientpositive/spark/stats_partscan_1_23.q.out
patching file ql/src/test/results/clientpositive/spark/statsfs.q.out
patching file 
ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out
patching file ql/src/test/results/clientpositive/spark/temp_table.q.out
patching file ql/src/test/results/clientpositive/spark/union10.q.out
patching file ql/src/test/results/clientpositive/spark/union12.q.out
patching file ql/src/test/results/clientpositive/spark/union17.q.out
patching file ql/src/test/results/clientpositive/spark/union18.q.out
patching file ql/src/test/results/clientpositive/spark/union19.q.out
patching file ql/src/test/results/clientpositive/spark/union22.q.out
patching file ql/src/test/results/clientpositive/spark/union25.q.out
patching file ql/src/test/results/clientpositive/spark/union28.q.out
patching file ql/src/test/results/clientpositive/spark/union29.q.out
patching file ql/src/test/results/clientpositive/spark/union30.q.out
patching file ql/src/test/results/clientpositive/spark/union31.q.out
patching file ql/src/test/results/clientpositive/spark/union33.q.out
patching file ql/src/test/results/clientpositive/spark/union4.q.out
patching file ql/src/test/results/clientpositive/spark/union6.q.out
patching file ql/src/test/results/clientpositive/spark/union_lateralview.q.out
patching file ql/src/test/results/clientpositive/spark/union_top_level.q.out
patching file ql/src/test/results/clientpositive/spark/vector_char_4.q.out
patching file ql/src/test/results/clientpositive/spark/vector_elt.q.out
patching file 
ql/src/test/results/clientpositive/spark/vector_left_outer_join.q.out
patching file ql/src/test/results/clientpositive/spark/vector_outer_join1.q.out
patching file ql/src/test/results/clientpositive/spark/vector_outer_join2.q.out
patching file ql/src/test/results/clientpositive/spark/vector_outer_join3.q.out
patching file ql/src/test/results/clientpositive/spark/vector_outer_join4.q.out
patching file ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out
patching file ql/src/test/results/clientpositive/spark/vector_varchar_4.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_0.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_13.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_14.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_15.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_16.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_17.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_9.q.out
patching file ql/src/test/results/clientpositive/spark/vectorization_div0.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorization_pushdown.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out
patching file ql/src/test/results/clientpositive/spark/vectorized_case.q.out
patching file ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out
patching file ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorized_shufflejoin.q.out
patching file 
ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out
patching file 
ql/src/test/results/clientpositive/special_character_in_tabnames_2.q.out
patching file ql/src/test/results/clientpositive/stats0.q.out
patching file ql/src/test/results/clientpositive/stats1.q.out
patching file ql/src/test/results/clientpositive/stats10.q.out
patching file ql/src/test/results/clientpositive/stats12.q.out
patching file ql/src/test/results/clientpositive/stats13.q.out
patching file ql/src/test/results/clientpositive/stats14.q.out
patching file ql/src/test/results/clientpositive/stats15.q.out
patching file ql/src/test/results/clientpositive/stats18.q.out
patching file ql/src/test/results/clientpositive/stats2.q.out
patching

[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056812#comment-16056812
 ] 

Hive QA commented on HIVE-14988:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873753/HIVE-14988.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10837 tests 
executed
*Failed tests:*
{noformat}
TestOperationLoggingLayout - did not produce a TEST-*.xml file (likely timed 
out) (batchId=222)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_ddl1] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_query5] 
(batchId=24)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=98)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=98)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[acid_overwrite] 
(batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.lockConflictDbTable 
(batchId=281)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLockBlockedBy 
(batchId=281)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMetastoreTablesCleanup 
(batchId=281)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation
 (batchId=219)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5699/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5699/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5699/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873753 - PreCommit-HIVE-Build

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16920) remove useless uri.getScheme() from EximUtil

2017-06-20 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056804#comment-16056804
 ] 

Fei Hui commented on HIVE-16920:


Failed tests are unrelated

> remove useless uri.getScheme() from EximUtil
> 
>
> Key: HIVE-16920
> URL: https://issues.apache.org/jira/browse/HIVE-16920
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16920.patch
>
>
> {code:title=EximUtil.java|borderStyle=solid}
> static URI getValidatedURI(HiveConf conf, String dcPath) throws 
> SemanticException {
> try {
>   boolean testMode = conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODE);
>   URI uri = new Path(dcPath).toUri();
>   String scheme = uri.getScheme();
>   String authority = uri.getAuthority();
>   String path = uri.getPath();
>   FileSystem fs = FileSystem.get(uri, conf);
>   LOG.info("Path before norm :" + path);
>   // generate absolute path relative to home directory
>   if (!path.startsWith("/")) {
> if (testMode) {
>   path = (new Path(System.getProperty("test.tmp.dir"), 
> path)).toUri().getPath();
> } else {
>   path =
>   (new Path(new Path("/user/" + System.getProperty("user.name")), 
> path)).toUri()
>   .getPath();
> }
>   }
>   // Get scheme from FileSystem
>   scheme = fs.getScheme();
>   ...
> }
> {code}
> We found that {{String scheme = uri.getScheme();}} is useless, we can remove 
> it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-20 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.1.patch

Initial patch, restructured the LlapTaskUmbilicalExternalClient code a bit.
- Uses shared LLAP umbilical server rather than a new server per external client
- Retries rejected submissions (WorkSubmitter helper class)
- No more deferred cleanup (from HIVE-16652). One thing about this is that once 
clients are closed/unregistered, communicator.stop() is called and it's removed 
from the registered list of clients. So we might get a few warning messages 
about untracked taskAttemptIds coming in during heartbeat() .. if this is 
undesirable we might be able to leave them in the registeredClients list (but 
ignore heartbeats to them as they are tagged as closed), and remove them using 
the HeartbeatCheckTask once they get too old.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056802#comment-16056802
 ] 

Siddharth Seth commented on HIVE-16927:
---

[~prasanth_j] - I don't think we make a permanent change of this being set to 
0. A bad instance will never stop on it's own, and will keep trying to launch 
new containers.
A better default would likely be numInstances, while making sure it is not too 
low (6 is the default for example), and the value is high enough to allow a 
node to be blacklisted.
Option1: numInstances * threshold to mark a node as disabled.
Option2: max(6, max(numInstances, threshold to mark a node as disabled))
Option3: ?

An enhancement request to Slider to get better control over this 

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056785#comment-16056785
 ] 

ZhangBing Lin commented on HIVE-16929:
--

Submit a patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Description: 
Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains the 
corresponding configuration package name under the class registered as a 
constant function.
Such as,

{code:java}

  hive.aux.udf.package.name.list
  com.sample.udf,com.test.udf

{code}


  was:Add a configuration item "hive.aux.udf.package.name.list", which is a 
scan corresponding to the $HIVE_HOME/auxlib/ directory jar package that 
contains the corresponding configuration package name under the class 
registered as a constant function.


> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Description: Add a configuration item "hive.aux.udf.package.name.list", 
which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
that contains the corresponding configuration package name under the class 
registered as a constant function.  (was: Add a configuration item 
"hive.aux.udf.package.name.list", which is a scan corresponding to the $ 
HIVE_HOME/auxlib/ directory jar package that contains the corresponding 
configuration package name under the class registered as a constant function.)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Patch Available  (was: Open)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Open  (was: Patch Available)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: HIVE-16929.1.patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: (was: HIVE-16929.1.patch)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Description: Add a configuration item "hive.aux.udf.package.name.list", 
which is a scan corresponding to the $ HIVE_HOME/auxlib/ directory jar package 
that contains the corresponding configuration package name under the class 
registered as a constant function.

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>
> Add a configuration item "hive.aux.udf.package.name.list", which is a scan 
> corresponding to the $ HIVE_HOME/auxlib/ directory jar package that contains 
> the corresponding configuration package name under the class registered as a 
> constant function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException

2017-06-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16233:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews and additional testing!

> llap: Query failed with AllocatorOutOfMemoryException
> -
>
> Key: HIVE-16233
> URL: https://issues.apache.org/jira/browse/HIVE-16233
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, 
> HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, 
> HIVE-16233.06.patch, HIVE-16233.07.patch
>
>
> {code}
> TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
> ... 15 more
> Caused by: java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataRea

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Attachment: HIVE-16929.1.patch

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16929:
-
Status: Patch Available  (was: Open)

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-20 Thread ZhangBing Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin reassigned HIVE-16929:



> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056769#comment-16056769
 ] 

Prasanth Jayachandran commented on HIVE-16927:
--

[~sseth] could you please take a look? small patch

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16927:
-
Status: Patch Available  (was: Open)

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16927:
-
Attachment: HIVE-16927.1.patch

for now setting it to infinite failures as some nodes can be good and few 
failures will not bring down good nodes that could be actually running queries. 

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056766#comment-16056766
 ] 

Prasanth Jayachandran commented on HIVE-16927:
--

One way to fix this is to set higher threshold for failures. Higher threshold 
can easily be reached on bigger clusters. If the threshold is set to 20, then 2 
failures on 10 nodes will bring down all daemons. Ideally, we want slider to 
retry failures on a different node.

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-16928) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-16928.
--
Resolution: Duplicate

Dup of HIVE-16927

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16928
> URL: https://issues.apache.org/jira/browse/HIVE-16928
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16927) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16927:



> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16928) LLAP: Slider takes down all daemons when some daemons fail repeatedly

2017-06-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16928:



> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> -
>
> Key: HIVE-16928
> URL: https://issues.apache.org/jira/browse/HIVE-16928
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16761:

Attachment: HIVE-16761.01.patch

Updated to fix the tests. [~hagleitn] I'm being told you are the expert on this 
(SMB join with multiple mapworks in the same Tez task. Can you please review?

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056751#comment-16056751
 ] 

Hive QA commented on HIVE-16233:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873746/HIVE-16233.07.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5698/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5698/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5698/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873746 - PreCommit-HIVE-Build

> llap: Query failed with AllocatorOutOfMemoryException
> -
>
> Key: HIVE-16233
> URL: https://issues.apache.org/jira/browse/HIVE-16233
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, 
> HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, 
> HIVE-16233.06.patch, HIVE-16233.07.patch
>
>
> {code}
> TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.All

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Status: Patch Available  (was: In Progress)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, 
> HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Status: In Progress  (was: Patch Available)

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, 
> HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16589:

Attachment: HIVE-16589.0993.patch

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and 
> COMPLETE  for AVG, VARIANCE
> ---
>
> Key: HIVE-16589
> URL: https://issues.apache.org/jira/browse/HIVE-16589
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, 
> HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, 
> HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, 
> HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, 
> HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, 
> HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, 
> HIVE-16589.0992.patch, HIVE-16589.0993.patch, HIVE-16589.099.patch, 
> HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for 
> Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG 
> GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in 
> the AVG struct as input.  And, add the COMPLETE mode that takes in the 
> Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.16.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys

2017-06-20 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056713#comment-16056713
 ] 

Vineet Garg commented on HIVE-16793:


I am investigating test failures. Will create it as soon as I have fix for 
tests.

> Scalar sub-query: Scalar safety checks for explicit group-bys
> -
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>

[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056697#comment-16056697
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873743/HIVE-13567.15.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5697/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5697/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5697/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-21 00:00:29.192
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5697/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-21 00:00:29.195
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   8c5f55e..4d141c1  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 8c5f55e HIVE-16797: Enhance HiveFilterSetOpTransposeRule to 
remove union branches (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing itests/src/test/resources/testconfiguration.properties.orig
Removing ql/src/test/queries/clientpositive/explaindenpendencydiffengs.q
Removing ql/src/test/results/clientpositive/explaindenpendencydiffengs.q.out
Removing 
ql/src/test/results/clientpositive/spark/explaindenpendencydiffengs.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 4d141c1 HIVE-16731: Vectorization: Make "CASE WHEN 
(day_name='Sunday') THEN column1 ELSE null end" that involves a column name or 
expression THEN or ELSE vectorize (Teddy Choi, reviwed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-21 00:00:35.017
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
fatal: git apply: bad git-diff - inconsistent old filename on line 1506
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873743 - PreCommit-HIVE-Build

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056694#comment-16056694
 ] 

Hive QA commented on HIVE-16875:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873738/HIVE-16875.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
 (batchId=220)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5696/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5696/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5696/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873738 - PreCommit-HIVE-Build

> Query against view with partitioned child on HoS fails with privilege 
> exception.
> 
>
> Key: HIVE-16875
> URL: https://issues.apache.org/jira/browse/HIVE-16875
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-16875.1.patch, HIVE-16875.2.patch, 
> HIVE-16875.3.patch
>
>
> Query against view with child table that has partitions fails with privilege 
> exception even with correct privileges.
> Reproduce:
> {noformat}
> create table jsamp1 (a string) partitioned by (b int);
> insert into table jsamp1 partition (b=1) values ("hello");
> create view jview as select * from jsamp1;
> create role viewtester;
> grant all on table jview to role viewtester;
> grant role viewtester to group testers;
> Use MR, the select will succeed:
> set hive.execution.engine=mr;
> select count(*) from jview;
> while use spark:
> set hive.execution.engine=spark;
> select count(*) from jview;
> it fails with:
> Error: Error while compiling statement: FAILED: SemanticException No valid 
> privileges
>  User tester does not have privileges for QUERY
>  The required privileges: 
> Server=server1->Db=default->Table=j1part->action=select; 
> (state=42000,code=4)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-20 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16926:
-


> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys

2017-06-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056678#comment-16056678
 ] 

Ashutosh Chauhan commented on HIVE-16793:
-

Can you create a RB for this ?

> Scalar sub-query: Scalar safety checks for explicit group-bys
> -
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=200

[jira] [Commented] (HIVE-16920) remove useless uri.getScheme() from EximUtil

2017-06-20 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056653#comment-16056653
 ] 

Ferdinand Xu commented on HIVE-16920:
-

+1 LGTM

> remove useless uri.getScheme() from EximUtil
> 
>
> Key: HIVE-16920
> URL: https://issues.apache.org/jira/browse/HIVE-16920
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16920.patch
>
>
> {code:title=EximUtil.java|borderStyle=solid}
> static URI getValidatedURI(HiveConf conf, String dcPath) throws 
> SemanticException {
> try {
>   boolean testMode = conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODE);
>   URI uri = new Path(dcPath).toUri();
>   String scheme = uri.getScheme();
>   String authority = uri.getAuthority();
>   String path = uri.getPath();
>   FileSystem fs = FileSystem.get(uri, conf);
>   LOG.info("Path before norm :" + path);
>   // generate absolute path relative to home directory
>   if (!path.startsWith("/")) {
> if (testMode) {
>   path = (new Path(System.getProperty("test.tmp.dir"), 
> path)).toUri().getPath();
> } else {
>   path =
>   (new Path(new Path("/user/" + System.getProperty("user.name")), 
> path)).toUri()
>   .getPath();
> }
>   }
>   // Get scheme from FileSystem
>   scheme = fs.getScheme();
>   ...
> }
> {code}
> We found that {{String scheme = uri.getScheme();}} is useless, we can remove 
> it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16731:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master.

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, 
> HIVE-16731.3.patch, HIVE-16731.4.patch
>
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16731:

Fix Version/s: 3.0.0

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, 
> HIVE-16731.3.patch, HIVE-16731.4.patch
>
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-06-20 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-16731:
---

Assignee: Teddy Choi  (was: Matt McCline)

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, 
> HIVE-16731.3.patch, HIVE-16731.4.patch
>
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-06-20 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056642#comment-16056642
 ] 

Matt McCline commented on HIVE-16731:
-

Ok, looks good.

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, 
> HIVE-16731.3.patch, HIVE-16731.4.patch
>
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16417) Introduce Service-client module

2017-06-20 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056628#comment-16056628
 ] 

Vaibhav Gumashta commented on HIVE-16417:
-

+1

> Introduce Service-client module
> ---
>
> Key: HIVE-16417
> URL: https://issues.apache.org/jira/browse/HIVE-16417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Server Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16417.1.patch
>
>
> Moving the relevant classes out from service, enables the jdbc driver to 
> relax its dependencies to only use {{service-client}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16731) Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null end" that involves a column name or expression THEN or ELSE vectorize

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056625#comment-16056625
 ] 

Hive QA commented on HIVE-16731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873734/HIVE-16731.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10822 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=100)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5695/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5695/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5695/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873734 - PreCommit-HIVE-Build

> Vectorization: Make "CASE WHEN (day_name='Sunday') THEN column1 ELSE null 
> end" that involves a column name or expression THEN or ELSE vectorize
> ---
>
> Key: HIVE-16731
> URL: https://issues.apache.org/jira/browse/HIVE-16731
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16731.1.patch, HIVE-16731.2.patch, 
> HIVE-16731.3.patch, HIVE-16731.4.patch
>
>
> Currently, CASE WHEN statements like that become VectorUDFAdaptor expressions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-20 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056621#comment-16056621
 ] 

Wei Zheng commented on HIVE-14988:
--

patch 03 is following the "new base" approach proposed by Eugene.

For example, we have such directory layout:
{code}
delta_1_1
delta_2_2
base_2
delta_3
{code}
After an Insert Overwrite, it should become like this:
{code}
delta_1_1
delta_2_2
base_2
delta_3
base_4 <= new base. All other dirs become obsolete.
{code}

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16925) isSlowStart lost during refactoring

2017-06-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056620#comment-16056620
 ] 

ASF GitHub Bot commented on HIVE-16925:
---

GitHub user dosoft opened a pull request:

https://github.com/apache/hive/pull/195

HIVE-16925: Add isSlowStart as parameter for the setAutoReduce method



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dosoft/hive HIVE-16925

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 9276b330d17a2c21d4ec6e1bc31bb6871429ec0e
Author: Oleg Danilov 
Date:   2017-06-20T22:50:14Z

HIVE-16925: Add isSlowStart as parameter for the setAutoReduce method




> isSlowStart lost during refactoring
> ---
>
> Key: HIVE-16925
> URL: https://issues.apache.org/jira/browse/HIVE-16925
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Priority: Minor
>
> TezEdgeProperty.setAutoReduce() should have isSlowStart as parameter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-16772) Support TPCDS query11.q in PerfCliDriver

2017-06-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-16772.

Resolution: Fixed

> Support TPCDS query11.q in PerfCliDriver
> 
>
> Key: HIVE-16772
> URL: https://issues.apache.org/jira/browse/HIVE-16772
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 54:22 Invalid column 
> reference 'customer_preferred_cust_flag'
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11744)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11692)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14988:
-
Status: Patch Available  (was: Open)

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reassigned HIVE-14988:


Assignee: Wei Zheng  (was: Eugene Koifman)

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14988:
-
Attachment: HIVE-14988.03.patch

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16924) Support distinct in presence Gby

2017-06-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056595#comment-16056595
 ] 

Ashutosh Chauhan commented on HIVE-16924:
-

In general case this needs two Gby. First to compute group by and aggregates 
and second to do distinct with Gby keys as all columns in select list. 
Queries in example provided can actually be computed by single Gby, but thats 
an optimization which potentially can be done in a follow-up.

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16924) Support distinct in presence Gby

2017-06-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056596#comment-16056596
 ] 

Ashutosh Chauhan commented on HIVE-16924:
-

cc: [~rusanu]

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16840:

Attachment: HIVE-16840.patch

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-20 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056593#comment-16056593
 ] 

liyunzhang_intel commented on HIVE-16840:
-

[~xuefuz],[~lirui],[~Ferd],[~csun]:  attached is HIVE-16840.1.patch.
Changes
1.  change physical plan at SetSparkReducerParallelism#process.  If 
needSetSparkReucerParallelism return false,this stands for current sink maybe 
an order by limit case. Add newSparkSortRS(actually its type is ReduceSink), 
newSel(actually its type is Sel),newLimit(actually its type is Limit) before 
sink.
original physical plan is 
{code} ...-RS-SEL-LIMIT{code}
now physical plan is 
{code} ...-newSparkSortRS-newSel-newLimit-RS-LIMIT{code}
currently i add SetSparkReducerParallelism#getNumReducerForSparkSortRS, it 
returns 10,this set the parallelism for newSparkSortRS. i will update the 
function in next patch.
2. add a property sortLimit in ReduceSinkOperator. If it is true. use partition 
sort  not global sort in GenSparkUtils#getEdgeProperty


Not fully test about the patch, just test a simple qfile, but I think we can 
parallel, you can review, i will start fully test.
{code}

select key,value from src order by key limit 10;

{code}

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSpark

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-20 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056577#comment-16056577
 ] 

Chao Sun commented on HIVE-11297:
-

Sorry for the late response. Will put comments in the RB.
Regarding the filterOp issue, It's a little strange since I'm seeing something 
different on my side (with the latest master branch). 
For the query you posted above, I saw:
{code}
TS[3] -> FIL[18] -> SEL[5] -> SEL[19] -> GBY[20] -> SPARKPRUNINGSINK[21]
TS[3] -> FIL[18] -> SEL[5] -> SEL[22] -> GBY[23] -> SPARKPRUNINGSINK[24]
TS[3] -> FIL[18] -> SEL[5] -> RS[7] -> JOIN[8] -> ...
{code}
inside {{SplitOpTreeForDPP}}.



> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16923) Hive-on-Spark DPP Improvements

2017-06-20 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056570#comment-16056570
 ] 

Sahil Takiar edited comment on HIVE-16923 at 6/20/17 10:19 PM:
---

Will post a design doc soon.

Two of the biggest limitations of the current DPP implementation are that it 
requires an additional Spark job and it requires writing some intermediate data 
to HDFS. We should evaluate the overhead of these limitations and if its 
possible to remove them.

Ideally, DPP shouldn't hurt performance for any query. One way to ensure this 
is to build some type of cost-based model that predicts whether or not DPP will 
help perf or not. For example, a simple cost-based model could simply enable 
DPP for map-joins only. Since map-joins already require two Spark jobs and 
writing intermediate data to HDFS, there shouldn't be significant overhead to 
running DPP with a map-join.


was (Author: stakiar):
Will post a design doc soon.

Two of the biggest limitations of the current DPP implementation are that it 
requires an additional Spark job and it requires writing some intermediate data 
to HDFS.

Ideally, DPP shouldn't hurt performance for any query. One way to ensure this 
is to build some type of cost-based model that predicts whether or not DPP will 
help perf or not. For example, a simple cost-based model could simply enable 
DPP for map-joins only. Since map-joins already require two Spark jobs and 
writing intermediate data to HDFS, there shouldn't be significant overhead to 
running DPP with a map-join.

> Hive-on-Spark DPP Improvements
> --
>
> Key: HIVE-16923
> URL: https://issues.apache.org/jira/browse/HIVE-16923
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Improvements to Hive-on-Spark DPP so that it is production ready.
> Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by 
> default. The goal of this JIRA is to improve the DPP implementation so that 
> it can be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16923) Hive-on-Spark DPP Improvements

2017-06-20 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056570#comment-16056570
 ] 

Sahil Takiar commented on HIVE-16923:
-

Will post a design doc soon.

Two of the biggest limitations of the current DPP implementation are that it 
requires an additional Spark job and it requires writing some intermediate data 
to HDFS.

Ideally, DPP shouldn't hurt performance for any query. One way to ensure this 
is to build some type of cost-based model that predicts whether or not DPP will 
help perf or not. For example, a simple cost-based model could simply enable 
DPP for map-joins only. Since map-joins already require two Spark jobs and 
writing intermediate data to HDFS, there shouldn't be significant overhead to 
running DPP with a map-join.

> Hive-on-Spark DPP Improvements
> --
>
> Key: HIVE-16923
> URL: https://issues.apache.org/jira/browse/HIVE-16923
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Improvements to Hive-on-Spark DPP so that it is production ready.
> Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by 
> default. The goal of this JIRA is to improve the DPP implementation so that 
> it can be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16923) Hive-on-Spark DPP Improvements

2017-06-20 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16923:
---


> Hive-on-Spark DPP Improvements
> --
>
> Key: HIVE-16923
> URL: https://issues.apache.org/jira/browse/HIVE-16923
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Improvements to Hive-on-Spark DPP so that it is production ready.
> Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by 
> default. The goal of this JIRA is to improve the DPP implementation so that 
> it can be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056561#comment-16056561
 ] 

Sergey Shelukhin commented on HIVE-16761:
-

Test failures are related, due to some bogus error.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException

2017-06-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16233:

Attachment: HIVE-16233.07.patch

> llap: Query failed with AllocatorOutOfMemoryException
> -
>
> Key: HIVE-16233
> URL: https://issues.apache.org/jira/browse/HIVE-16233
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, 
> HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, 
> HIVE-16233.06.patch, HIVE-16233.07.patch
>
>
> {code}
> TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
> ... 15 more
> Caused by: java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:235)
>

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056552#comment-16056552
 ] 

Hive QA commented on HIVE-16761:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873725/HIVE-16761.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10822 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_all] 
(batchId=57)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5694/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5694/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5694/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873725 - PreCommit-HIVE-Build

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException

2017-06-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16233:

Attachment: (was: HIVE-16233.07.patch)

> llap: Query failed with AllocatorOutOfMemoryException
> -
>
> Key: HIVE-16233
> URL: https://issues.apache.org/jira/browse/HIVE-16233
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, 
> HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, 
> HIVE-16233.06.patch
>
>
> {code}
> TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
> ... 15 more
> Caused by: java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:235)
> at

1 2 >

1 - 100 of 177 matches

Mail list logo