[jira] [Commented] (HIVE-15847) In Progress update refreshes seem slow

2017-02-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874167#comment-15874167
 ] 

Thejas M Nair commented on HIVE-15847:
--

The changes to formatting of output looks good.
However, I haven't understood how this impacts the rate of refresh.

cc [~prasanth_j] 

> In Progress update refreshes seem slow
> --
>
> Key: HIVE-15847
> URL: https://issues.apache.org/jira/browse/HIVE-15847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-15847.1.patch
>
>
> After HIVE-15473, the refresh rates for in place progress bar seems to be 
> slow on hive cli. 
> As pointed out by [~prasanth_j] 
> {quote}
> The refresh rate is slow. Following video will show it
> before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
> after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874166#comment-15874166
 ] 

Pengcheng Xiong commented on HIVE-15874:


LGTM +1

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Walter Wu
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15874.1.patch
>
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2017-02-19 Thread ren xing (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874129#comment-15874129
 ] 

ren xing edited comment on HIVE-14274 at 2/20/17 7:57 AM:
--

Any one any ideas? Seems we encountered the same problem when using pig to read 
hive parquet table after we added a sub field to a structure type field.
{noformat}
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:451)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:410)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:386)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
{noformat}


was (Author: ren xing):
Any one any ideas? Seems we encountered the same problem when using pig to read 
hive parquet table after we added a sub field to a structure type field.

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:451)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:410)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:386)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)


> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: 

[jira] [Assigned] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu reassigned HIVE-15874:


Assignee: Pengcheng Xiong

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Walter Wu
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15874.1.patch
>
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874144#comment-15874144
 ] 

Gopal V commented on HIVE-15987:


-1 for Hive-2.x branch storage-api impl, we consider this for Hive-3.0 branch 
since this breaks external interfaces to ORC and 3rd party vectorized udfs.

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>  Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15987:
---
Labels: incompatibleChange  (was: )

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>  Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2017-02-19 Thread ren xing (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874129#comment-15874129
 ] 

ren xing edited comment on HIVE-14274 at 2/20/17 7:02 AM:
--

Any one any ideas? Seems we encountered the same problem when using pig to read 
hive parquet table after we added a sub field to a structure type field.

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:451)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:410)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:386)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)



was (Author: ren xing):
Any one any ideas? Seems we encountered the same problem when using pig to read 
hive parquet table after we added a sub field to a structure type field.

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:451)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:410)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:386)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)


> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>

[jira] [Commented] (HIVE-14274) When columns are added to structs in a Hive table, HCatLoader breaks.

2017-02-19 Thread ren xing (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874129#comment-15874129
 ] 

ren xing commented on HIVE-14274:
-

Any one any ideas? Seems we encountered the same problem when using pig to read 
hive parquet table after we added a sub field to a structure type field.

Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6018: 
Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:451)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:410)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:468)
at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:386)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)


> When columns are added to structs in a Hive table, HCatLoader breaks.
> -
>
> Key: HIVE-14274
> URL: https://issues.apache.org/jira/browse/HIVE-14274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-14274.1.patch
>
>
> Consider this sequence of table/partition creation and schema evolution:
> {code:sql}
> -- Create table.
> CREATE EXTERNAL TABLE `simple_text` (
> foo STRING,
> bar STRUCT
> )
> PARTITIONED BY ( dt STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ':'
> STORED AS TEXTFILE ;
> -- Add partition.
> ALTER TABLE simple_text ADD PARTITION ( dt='0' );
> -- Alter the struct-column to add a new sub-field.
> ALTER TABLE simple_text CHANGE COLUMN bar bar STRUCT zoo:STRING>;
> {code}
> The {{dt='0'}} partition's schema indicates 2 fields in {{bar}}. The data can 
> be read using Hive, but not through HCatLoader. The error looks as follows:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: data_raw: 
> Store(hdfs://dilithiumblue-nn1.blue.ygrid.yahoo.com:8020/tmp/temp-643668868/tmp-1639945319:org.apache.pig.impl.io.TFileStorage)
>  - scope-1 Operator Key: scope-1): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 

[jira] [Updated] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu updated HIVE-15874:
-
Attachment: HIVE-15874.1.patch

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Walter Wu
> Attachments: HIVE-15874.1.patch
>
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu updated HIVE-15874:
-
Comment: was deleted

(was: HIVE-15874.1.patch)

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Walter Wu
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu updated HIVE-15874:
-
Status: Patch Available  (was: Open)

HIVE-15874.1.patch

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1, 1.2.1
>Reporter: Walter Wu
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu updated HIVE-15874:
-
Status: Open  (was: Patch Available)

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1, 1.2.1
>Reporter: Walter Wu
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15987:
--
Component/s: Vectorization

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15987:
--
Description: 
Most of data operations in Hive uses null operations. The current 
implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 1 
boolean with a backing long array. Also logical operations between longs are 
much faster than ones with bytes as it uses less instructions per byte. So it 
will bring 8x or more performance for null operations.

However, there also are several cases that will make this improvement slow. 
Such as simple reads will require more instructions per row. So it should 
include benchmark tests to show its performance impact.

  was:Most of data operations in Hive uses null operations. The current 
implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 1 
boolean with a backing long array. Also logical operations between longs are 
much faster than ones with bytes as it uses less instructions per byte. So it 
will bring 8x or more performance for null operations.


> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-15987:
--
Description: Most of data operations in Hive uses null operations. The 
current implementation of ColumnVector.isNull uses a boolean array, which uses 
8 bits per 1 boolean. BitSet is a more compact representation, as it uses 1 bit 
per 1 boolean with a backing long array. Also logical operations between longs 
are much faster than ones with bytes as it uses less instructions per byte. So 
it will bring 8x or more performance for null operations.

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15874) Invalid position alias in Group By when CBO failed

2017-02-19 Thread Walter Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874116#comment-15874116
 ] 

Walter Wu commented on HIVE-15874:
--

Use MR engine. 
Before CBO optimize, processPositionAlias will be called once. 'select 2017 as 
a, b from alias_test_02 group by 1, 2' will be transformed into  'select 2017 
as a, b from alias_test_02 group by 2017, b'.
When CBO optimize failed and reAnalyzeAST is set to true, processPositionAlias 
will be called twice. '2017' will be considered to column alias. Error : 
'Invalid position alias in Group By' will occur.
Obviously, we just need process position alias one time. So the most intuitive 
and effective way is stepping processPositionAlias out of genResolvedParseTree.

> Invalid position alias in Group By when CBO failed 
> ---
>
> Key: HIVE-15874
> URL: https://issues.apache.org/jira/browse/HIVE-15874
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Walter Wu
>
> for example:
> create table alias_test_01(a INT, b STRING) ;
> create table alias_test_02(a INT, b STRING) ;
> create table alias_test_03(a INT, b STRING) ;
> set hive.groupby.position.alias = true;
> set hive.cbo.enable=true;
> explain 
> select * from 
> alias_test_01 alias01 
> left join 
> (
> select 2017 as a, b from alias_test_02 group by 1, 2
> ) alias02 
> on alias01.a = alias02.a 
> left join 
> alias_test_03 alias03
> on alias01.a = alias03.a;
> error info:
> FAILED: SemanticException [Error 10220]: Invalid position alias in Group By
> Position alias: 2017 does not exist
> The Select List is indexed from 1 to 2
> the first process Position Alias result:
> when CBO optimize failed and reAnalyzeAST is true, position alias will be 
> processed twice.
> 1.   'group by 1, 2' convert to 'group by 2017, b'
> 2.   'group by 2017, b'  2017 column does not exist



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-15987:
-


> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15972) Runtime filtering not vectorizing for decimal/timestamp/char/varchar

2017-02-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874063#comment-15874063
 ] 

Matt McCline commented on HIVE-15972:
-

+1 LGTM

> Runtime filtering not vectorizing for decimal/timestamp/char/varchar
> 
>
> Key: HIVE-15972
> URL: https://issues.apache.org/jira/browse/HIVE-15972
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15972.1.patch
>
>
> Looks like versions of vectorized BetweenDynamicValue that use Java objects 
> needs to be initialized with non-null values



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-15973) Make interval_arithmetic.q test robust

2017-02-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-15973.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

Pushed to master.

> Make interval_arithmetic.q test robust
> --
>
> Key: HIVE-15973
> URL: https://issues.apache.org/jira/browse/HIVE-15973
> Project: Hive
>  Issue Type: Task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15973.patch
>
>
> Relies on current_date() which isn't repeatable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15891:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~ekoifman] for the review!

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL: https://issues.apache.org/jira/browse/HIVE-15891
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15891.1.patch, HIVE-15891.2.patch, 
> HIVE-15891.3.patch, HIVE-15891.4.patch
>
>
> Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
> but it's different from the general approach of modifying the token stream 
> and thus will cause AST tree mismatch if there is any rewrite happening after 
> UpdateDeleteSemanticAnalyzer.
> The long term solution will be to rewrite the AST handling logic in 
> UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.
> This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15973) Make interval_arithmetic.q test robust

2017-02-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15973:

Attachment: HIVE-15973.patch

> Make interval_arithmetic.q test robust
> --
>
> Key: HIVE-15973
> URL: https://issues.apache.org/jira/browse/HIVE-15973
> Project: Hive
>  Issue Type: Task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15973.patch
>
>
> Relies on current_date() which isn't repeatable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15973) Make interval_arithmetic.q test robust

2017-02-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-15973:
---


> Make interval_arithmetic.q test robust
> --
>
> Key: HIVE-15973
> URL: https://issues.apache.org/jira/browse/HIVE-15973
> Project: Hive
>  Issue Type: Task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>
> Relies on current_date() which isn't repeatable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873928#comment-15873928
 ] 

Hive QA commented on HIVE-15891:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853503/HIVE-15891.4.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10249 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3653/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3653/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3653/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853503 - PreCommit-HIVE-Build

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL: https://issues.apache.org/jira/browse/HIVE-15891
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15891.1.patch, HIVE-15891.2.patch, 
> HIVE-15891.3.patch, HIVE-15891.4.patch
>
>
> Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
> but it's different from the general approach of modifying the token stream 
> and thus will cause AST tree mismatch if there is any rewrite happening after 
> UpdateDeleteSemanticAnalyzer.
> The long term solution will be to rewrite the AST handling logic in 
> UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.
> This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15941) Fix o.a.h.hive.ql.exec.tez.TezTask compilation issue with tez master

2017-02-19 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15941:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~sseth]. Committed to master.

> Fix o.a.h.hive.ql.exec.tez.TezTask compilation issue with tez master
> 
>
> Key: HIVE-15941
> URL: https://issues.apache.org/jira/browse/HIVE-15941
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15941.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15891:
-
Attachment: HIVE-15891.4.patch

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL: https://issues.apache.org/jira/browse/HIVE-15891
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15891.1.patch, HIVE-15891.2.patch, 
> HIVE-15891.3.patch, HIVE-15891.4.patch
>
>
> Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
> but it's different from the general approach of modifying the token stream 
> and thus will cause AST tree mismatch if there is any rewrite happening after 
> UpdateDeleteSemanticAnalyzer.
> The long term solution will be to rewrite the AST handling logic in 
> UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.
> This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15891:
-
Status: Patch Available  (was: Open)

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL: https://issues.apache.org/jira/browse/HIVE-15891
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15891.1.patch, HIVE-15891.2.patch, 
> HIVE-15891.3.patch, HIVE-15891.4.patch
>
>
> Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
> but it's different from the general approach of modifying the token stream 
> and thus will cause AST tree mismatch if there is any rewrite happening after 
> UpdateDeleteSemanticAnalyzer.
> The long term solution will be to rewrite the AST handling logic in 
> UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.
> This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15891:
-
Status: Open  (was: Patch Available)

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL: https://issues.apache.org/jira/browse/HIVE-15891
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15891.1.patch, HIVE-15891.2.patch, 
> HIVE-15891.3.patch
>
>
> Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
> but it's different from the general approach of modifying the token stream 
> and thus will cause AST tree mismatch if there is any rewrite happening after 
> UpdateDeleteSemanticAnalyzer.
> The long term solution will be to rewrite the AST handling logic in 
> UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.
> This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15904:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Fix For: 2.2.0
>
> Attachments: HIVE-15904.1.patch, HIVE-15904.2.patch, 
> HIVE-15904.3.patch, HIVE-15904.4.patch, HIVE-15904.5.patch, 
> HIVE-15904.6.patch, table_18.q, table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> Error Stack:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>  
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
>  
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
>  
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
>  
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:599)
>  
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:402)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  
> at 
> 

[jira] [Commented] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873879#comment-15873879
 ] 

Hive QA commented on HIVE-15904:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853498/HIVE-15904.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3652/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3652/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3652/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853498 - PreCommit-HIVE-Build

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Attachments: HIVE-15904.1.patch, HIVE-15904.2.patch, 
> HIVE-15904.3.patch, HIVE-15904.4.patch, HIVE-15904.5.patch, 
> HIVE-15904.6.patch, table_18.q, table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> Error Stack:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>  
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
>  
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
>  
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
>  
> at 
> 

[jira] [Updated] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15904:
--
Attachment: HIVE-15904.6.patch

Misspelled "dynamic_semijoin_reduction_2.q" in testconfiguration.properties

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Attachments: HIVE-15904.1.patch, HIVE-15904.2.patch, 
> HIVE-15904.3.patch, HIVE-15904.4.patch, HIVE-15904.5.patch, 
> HIVE-15904.6.patch, table_18.q, table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> Error Stack:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>  
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
>  
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
>  
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
>  
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:599)
>  
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:402)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  
> at 
> 

[jira] [Commented] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873857#comment-15873857
 ] 

Hive QA commented on HIVE-15904:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853487/HIVE-15904.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_semijoin_reduction_2]
 (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3651/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3651/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3651/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853487 - PreCommit-HIVE-Build

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Attachments: HIVE-15904.1.patch, HIVE-15904.2.patch, 
> HIVE-15904.3.patch, HIVE-15904.4.patch, HIVE-15904.5.patch, table_18.q, 
> table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> Error Stack:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>  
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
>  
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
>  
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
>  
> at 
> 

[jira] [Commented] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873848#comment-15873848
 ] 

Xuefu Zhang commented on HIVE-15796:


Thanks for working on this, Chao! Above there is a TestSparkCliDriver test 
failure and I'm wondering if it's related. Secondly, it looks like we added one 
more path when setting reducer parallelism. Please explain a bit. (I remember 
we didn't have that in a previous patch that I reviewed.) Using two passes to 
avoid plan change, as you noted previously?

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.2.patch, 
> HIVE-15796.3.patch, HIVE-15796.4.patch, HIVE-15796.5.patch, 
> HIVE-15796.6.patch, HIVE-15796.wip.1.patch, HIVE-15796.wip.2.patch, 
> HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15904:
--
Attachment: HIVE-15904.5.patch

Simplifying the test (slightly), adding to testconfiguration.properties.

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Attachments: HIVE-15904.1.patch, HIVE-15904.2.patch, 
> HIVE-15904.3.patch, HIVE-15904.4.patch, HIVE-15904.5.patch, table_18.q, 
> table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> Error Stack:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
>  
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>  
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
>  
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
>  
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
>  
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:599)
>  
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:402)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  
> at 
> 

[jira] [Commented] (HIVE-15972) Runtime filtering not vectorizing for decimal/timestamp/char/varchar

2017-02-19 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873761#comment-15873761
 ] 

Jason Dere commented on HIVE-15972:
---

Failure do not appear to be related to the patch.
[~mmccline] can you review?

> Runtime filtering not vectorizing for decimal/timestamp/char/varchar
> 
>
> Key: HIVE-15972
> URL: https://issues.apache.org/jira/browse/HIVE-15972
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15972.1.patch
>
>
> Looks like versions of vectorized BetweenDynamicValue that use Java objects 
> needs to be initialized with non-null values



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15910) Improvements in Hive Unit Test by using In-memory Derby DB

2017-02-19 Thread Sankar Hariappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873736#comment-15873736
 ] 

Sankar Hariappan commented on HIVE-15910:
-

[~wzheng]
Currently, I didn't check the performance impacts due to creation and deletion 
of temp dir for each test case. 
However, this can be tracked using another JIRA ticket to make the temp dir 
creation only once for whole of TestWorker.

> Improvements in Hive Unit Test by using In-memory Derby DB
> --
>
> Key: HIVE-15910
> URL: https://issues.apache.org/jira/browse/HIVE-15910
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15910.01.patch, HIVE-15910.05.patch, 
> HIVE-15910.06.patch, HIVE-15910.2.patch, HIVE-15910.3.patch, 
> HIVE-15910.4.patch
>
>
> Hive UT currently uses Derby DB with storage on disk which have some 
> practical problems.
> 1. The run-time of Hive unit tests are high as need to operate on the disk 
> quite often.
> 2. It can cause conflict if multiple test cases operates on the same table 
> name (such as table being created already exist).
> To solve these problems, we shall use an in-memory storage option of Derby DB 
> which can be even persisted if the test case demands that.
> https://db.apache.org/derby/docs/10.8/devguide/cdevdvlpinmemdb.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873605#comment-15873605
 ] 

Lefty Leverenz commented on HIVE-15489:
---

Doc note:  This adds *hive.spark.use.file.size.for.mapjoin* to HiveConf.java, 
so it needs to be documented in the wiki.

* [Configuration Properties -- Spark | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark]

Added a TODOC2.2 label.

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.7.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-19 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15489:
--
Labels: TODOC2.2  (was: )

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.7.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)