Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/#review83359
---



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java
https://reviews.apache.org/r/34059/#comment134334

booleans in java are false by default



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java
https://reviews.apache.org/r/34059/#comment134335

Objects are null by default in Java



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java
https://reviews.apache.org/r/34059/#comment134336

It is not necessary but I do not see a reason why the visibility of this 
method should be reduced. Should it be public as all others?


- Alexander Pivovarov


On May 11, 2015, 9:48 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34059/
 ---
 
 (Updated May 11, 2015, 9:48 p.m.)
 
 
 Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.
 
 
 Bugs: HIVE-10673
 https://issues.apache.org/jira/browse/HIVE-10673
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 
   itests/src/test/resources/testconfiguration.properties eeb46cc 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
 545d7c6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
 cdabe3a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
 15c747e 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
  a9082eb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42b643 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 4d84f0f 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 f7e1dbc 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34059/diff/
 
 
 Testing
 ---
 
 q-file tests added
 
 
 Thanks,
 
 Jason Dere
 




Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/#review83362
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
https://reviews.apache.org/r/34059/#comment134342

usually static Log should be private because superclass static methods 
should use their own static Log to avoid confusion.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
https://reviews.apache.org/r/34059/#comment134340

Can you use Map.Entry to avoid unnecesary lookup 3 lines below?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java
https://reviews.apache.org/r/34059/#comment134343

ReduceSinkOperator uses Object.hashCode() and equals() methods.
HashSet algo relies on hashCode/equals methods


- Alexander Pivovarov


On May 11, 2015, 9:48 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34059/
 ---
 
 (Updated May 11, 2015, 9:48 p.m.)
 
 
 Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.
 
 
 Bugs: HIVE-10673
 https://issues.apache.org/jira/browse/HIVE-10673
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 
   itests/src/test/resources/testconfiguration.properties eeb46cc 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
 545d7c6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
 cdabe3a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
 15c747e 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
  a9082eb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42b643 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 4d84f0f 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 f7e1dbc 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34059/diff/
 
 
 Testing
 ---
 
 q-file tests added
 
 
 Thanks,
 
 Jason Dere
 




Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/#review83367
---



ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java
https://reviews.apache.org/r/34059/#comment134344

trailing space



ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java
https://reviews.apache.org/r/34059/#comment134347

Why calling getEntry(key) two times consequently? 
containsKey() and get() call getEntry internally

Just call get(rs) one time, check thet result is not null and remove the 
second get(rs)


- Alexander Pivovarov


On May 11, 2015, 9:48 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34059/
 ---
 
 (Updated May 11, 2015, 9:48 p.m.)
 
 
 Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.
 
 
 Bugs: HIVE-10673
 https://issues.apache.org/jira/browse/HIVE-10673
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 
   itests/src/test/resources/testconfiguration.properties eeb46cc 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
 545d7c6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
 cdabe3a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
 15c747e 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
  a9082eb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42b643 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 4d84f0f 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 f7e1dbc 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34059/diff/
 
 
 Testing
 ---
 
 q-file tests added
 
 
 Thanks,
 
 Jason Dere
 




[jira] [Created] (HIVE-10682) LLAP: Make use of the task runner which allows killing tasks

2015-05-12 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10682:
-

 Summary: LLAP: Make use of the task runner which allows killing 
tasks
 Key: HIVE-10682
 URL: https://issues.apache.org/jira/browse/HIVE-10682
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap


TEZ-2434 adds a runner which allows tasks to be killed. Jira to integrate with 
that without the actual kill functionality. That will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10683) LLAP: Add a mechanism for daemons to inform the AM about killed tasks

2015-05-12 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10683:
-

 Summary: LLAP: Add a mechanism for daemons to inform the AM about 
killed tasks
 Key: HIVE-10683
 URL: https://issues.apache.org/jira/browse/HIVE-10683
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/#review83371
---



ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java
https://reviews.apache.org/r/34059/#comment134348

trailing space



ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java
https://reviews.apache.org/r/34059/#comment134349

Java will set it to 0 in constructor anyway.



ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
https://reviews.apache.org/r/34059/#comment134350

Remove this line and add String type declaration 3 lines below. Do not 
confuse GC.



ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
https://reviews.apache.org/r/34059/#comment134351

it will be false by default


- Alexander Pivovarov


On May 11, 2015, 9:48 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34059/
 ---
 
 (Updated May 11, 2015, 9:48 p.m.)
 
 
 Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.
 
 
 Bugs: HIVE-10673
 https://issues.apache.org/jira/browse/HIVE-10673
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
 reducer are unsorted.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 
   itests/src/test/resources/testconfiguration.properties eeb46cc 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
 545d7c6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
 cdabe3a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
 15c747e 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
  a9082eb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42b643 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
 4d84f0f 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 f7e1dbc 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/34059/diff/
 
 
 Testing
 ---
 
 q-file tests added
 
 
 Thanks,
 
 Jason Dere
 




[jira] [Created] (HIVE-10687) AvroDeserializer fails to deserialize evolved union fields

2015-05-12 Thread Swarnim Kulkarni (JIRA)
Swarnim Kulkarni created HIVE-10687:
---

 Summary: AvroDeserializer fails to deserialize evolved union fields
 Key: HIVE-10687
 URL: https://issues.apache.org/jira/browse/HIVE-10687
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni


Consider the union field:

union {int, string}

and now this field evolves to

union {null, int, string}.

Running it through the avro schema compatibility check[1], they are actually 
compatible which means that the latter could be used to deserialize the data 
written with former. However the avro deserializer fails to do that. Mainly 
because of the way it reads the tags from the reader schema and then reds the 
corresponding data from the writer schema. [2]

[1] http://pastebin.cerner.corp/31078
[2] 
https://github.com/cloudera/hive/blob/cdh5.4.0-release/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java#L354



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10688) constant folding is broken for case-when udf

2015-05-12 Thread Jagruti Varia (JIRA)
Jagruti Varia created HIVE-10688:


 Summary: constant folding is broken for case-when udf
 Key: HIVE-10688
 URL: https://issues.apache.org/jira/browse/HIVE-10688
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Jagruti Varia
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0


In some cases, case-when udf throws IndexOutOfBoundsException as shown below:
{noformat}
FAILED: IndexOutOfBoundsException Index: 2, Size: 2
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.shortcutFunction(ConstantPropagateProcFactory.java:428)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:238)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:98)
at 
org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:679)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)

[jira] [Created] (HIVE-10690) ArrayIndexOutOfBounds exception in MetaStoreDirectSql.aggrColStatsForPartitions()

2015-05-12 Thread Jason Dere (JIRA)
Jason Dere created HIVE-10690:
-

 Summary: ArrayIndexOutOfBounds exception in 
MetaStoreDirectSql.aggrColStatsForPartitions()
 Key: HIVE-10690
 URL: https://issues.apache.org/jira/browse/HIVE-10690
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere


Noticed a bunch of these stack traces in hive.log while running some unit tests:

{noformat}
2015-05-11 21:18:59,371 WARN  [main]: metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2420)) - Direct SQL failed
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.aggrColStatsForPartitions(MetaStoreDirectSql.java:1132)
at 
org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:6162)
at 
org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:6158)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2385)
at 
org.apache.hadoop.hive.metastore.ObjectStore.get_aggr_stats_for(ObjectStore.java:6158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy84.get_aggr_stats_for(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_aggr_stats_for(HiveMetaStore.java:5662)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy86.get_aggr_stats_for(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAggrColStatsFor(HiveMetaStoreClient.java:2064)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy87.getAggrColStatsFor(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getAggrColStatsFor(Hive.java:3110)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:245)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.updateColStats(RelOptHiveTable.java:329)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getColStat(RelOptHiveTable.java:399)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getColStat(RelOptHiveTable.java:392)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan.getColStat(HiveTableScan.java:150)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:77)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:64)
at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182)
at com.sun.proxy.$Proxy108.getDistinctRowCount(Unknown Source)
at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
at com.sun.proxy.$Proxy108.getDistinctRowCount(Unknown Source)
at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

[jira] [Created] (HIVE-10691) Fix incorrect text in README about not support inserts and updates

2015-05-12 Thread Alan Gates (JIRA)
Alan Gates created HIVE-10691:
-

 Summary: Fix incorrect text in README about not support inserts 
and updates
 Key: HIVE-10691
 URL: https://issues.apache.org/jira/browse/HIVE-10691
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0, 0.14.0, 1.2.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor


The README says 
{quote}
Hive is not designed for online transaction processing and does not support row 
level insert/updates.
{quote}
This is not true.  As of Hive 0.14 it does support row level insert and updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10689) HS2 metadata api calls should be authorized via HiveAuthorizer

2015-05-12 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-10689:


 Summary: HS2 metadata api calls should be authorized via 
HiveAuthorizer
 Key: HIVE-10689
 URL: https://issues.apache.org/jira/browse/HIVE-10689
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair


java.sql.DataBaseMetadata apis in jdbc api result in calls to HS2 metadata 
api's and their execution is via separate Hive Operation implementations, that 
don't use the Hive Driver class. Invocation of these api's should also be 
authorized using the HiveAuthorizer api.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10692) DAGs get stuck at start with no tasks executing

2015-05-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10692:
---

 Summary: DAGs get stuck at start with no tasks executing
 Key: HIVE-10692
 URL: https://issues.apache.org/jira/browse/HIVE-10692
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth


Internal app ID application_1429683757595_0914, LLAP 
application_1429683757595_0913. If someone without access wants to investigate 
I'll get the logs.
2nd dag failed to start executing:
http://cn043-10.l42scl.hortonworks.com:8042/node/containerlogs/container_1429683757595_0914_01_01/sershe/syslog_dag_1429683757595_0914_2/?start=-65536

After many  S_TA_LAUNCH_REQUEST-s, the following is logged and after that 
there's no more logging aside from refreshes until I killed the DAG. LLAP 
daemons were idling meanwhile.
{noformat}
2015-05-12 13:52:08,997 INFO [TaskSchedulerEventHandlerThread] 
rm.TaskSchedulerEventHandler: Processing the event EventType: 
S_TA_LAUNCH_REQUEST
2015-05-12 13:52:18,507 INFO [LlapSchedulerNodeEnabler] 
impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 556007888
2015-05-12 13:52:25,315 INFO [HistoryEventHandlingThread] 
ats.ATSHistoryLoggingService: Event queue stats, 
eventsProcessedSinceLastUpdate=407, eventQueueSize=614
2015-05-12 13:52:28,507 INFO [LlapSchedulerNodeEnabler] 
impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 556007888
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10693) LLAP: DAG got stuck after reducer fetch failed

2015-05-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10693:
---

 Summary: LLAP: DAG got stuck after reducer fetch failed
 Key: HIVE-10693
 URL: https://issues.apache.org/jira/browse/HIVE-10693
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth


Internal app ID application_1429683757595_0912, LLAP 
application_1429683757595_0911. If someone without access wants to investigate 
I'll get the logs.
I've ran into this only once. Feel free to close as not repro, I'll reopen if I 
see again :) I want to make sure some debug info is preserved just in case.
Running Q1 - Map 1 w/1000 tasks (in this particular case), followed by Reducer 
2 and Reducer 3, 1 task each, IIRC 3 is uber.
Fetch failed with I'd assume some random disturbance in the force:
{noformat}
2015-05-12 13:37:31,056 [fetcher [Map_1] #17()] WARN 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped:
 Failed to verify reply after connecting to 
cn047-10.l42scl.hortonworks.com:15551 with 1 inputs pending
java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.$$YJP$$socketRead0(Native Method)
   at java.net.SocketInputStream.socketRead0(SocketInputStream.java)
   at java.net.SocketInputStream.read(SocketInputStream.java:150)
   at java.net.SocketInputStream.read(SocketInputStream.java:121)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:703)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:787)
   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
   at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534)
   at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
   at 
org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:256)
   at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:339)
   at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:257)
   at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:167)
   at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:182)
{noformat}

AM registered this as Map 1 task failure
{noformat}
2015-05-12 13:37:31,156 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: 
attempt_1429683757595_0912_1_00_000998_0 blamed for read error from 
attempt_1429683757595_0912_1_01_00_0 at inputIndex 998
...
2015-05-12 13:37:31,174 INFO [Dispatcher thread: Central] impl.TaskImpl: 
Scheduling new attempt for task: task_1429683757595_0912_1_00_000998, 
currentFailedAttempts: 1, maxFailedAttempts: 4
{noformat}

Eventually Map 1 completed
{noformat}
2015-05-12 13:38:25,247 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1429683757595_0912_1][Event:VERTEX_FINISHED]: vertexName=Map 
1, vertexId=vertex_1429683757595_0912_1_00, initRequestedTime=1431462752913, 
initedTime=1431462754818, startRequestedTime=1431462754819, 
startedTime=1431462754819, finishTime=1431463105101, timeTaken=350282, 
status=SUCCEEDED, diagnostics=, counters=Counters: 29, 
org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=59, 
RACK_LOCAL_TASKS=941, File System Counters, FILE_BYTES_READ=2160704, 
FILE_BYTES_WRITTEN=20377550, FILE_READ_OPS=0, FILE_LARGE_READ_OPS=0, 
FILE_WRITE_OPS=0, HDFS_BYTES_READ=9798097828287, HDFS_BYTES_WRITTEN=0, 
HDFS_READ_OPS=406131, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=0, 
org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=4000, 
GC_TIME_MILLIS=73309, CPU_MILLISECONDS=0, PHYSICAL_MEMORY_BYTES=-1000, 
VIRTUAL_MEMORY_BYTES=-1000, COMMITTED_HEAP_BYTES=25769803776000, 
INPUT_RECORDS_PROCESSED=5861038, OUTPUT_RECORDS=4000, OUTPUT_BYTES=376000, 
OUTPUT_BYTES_WITH_OVERHEAD=0, OUTPUT_BYTES_PHYSICAL=0, 
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, 
ADDITIONAL_SPILL_COUNT=0, HIVE, DESERIALIZE_ERRORS=0, 
RECORDS_IN_Map_1=589709, RECORDS_OUT_INTERMEDIATE_Map_1=4000, 
vertexStats=firstTaskStartTime=1431462757804, firstTasksToStart=[ 
task_1429683757595_0912_1_00_00 ], lastTaskFinishTime=1431463105085, 
lastTasksToFinish=[ task_1429683757595_0912_1_00_000999 ], 
minTaskDuration=1743, maxTaskDuration=236653, 
avgTaskDuration=6377.3342, numSuccessfulTasks=1000, 
shortestDurationTasks=[ 

Re: [VOTE] Apache Hive 1.2.0 release candidate 3

2015-05-12 Thread Alan Gates
+1.  Checked LICENSE, NOTICE, README, and RELEASE_NOTES, signatures, 
looked for .class or .jar files, did a quick build.


Alan.


Sushanth Sowmyan mailto:khorg...@gmail.com
May 11, 2015 at 19:11
Hi Folks,

We've cleared all the blockers listed for 1.2.0 release, either
committing them, or deferring out to an eventual 1.2.1 stabilization
release. (Any deferrals were a result of discussion between myself and
the committer responsible for the issue.) More details are available
here : 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status


Apache Hive 1.2.0 Release Candidate 2 is available here:

https://people.apache.org/~khorgath/releases/1.2.0_RC3/artifacts/

My public key used for signing is as available from the hive
committers key list : http://www.apache.org/dist/hive/KEYS

Maven artifacts are available here:

https://repository.apache.org/content/repositories/orgapachehive-1034

Source tag for RC3 is up on the apache git repo as tag
release-1.2.0-rc3 (Browseable view over at
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=826695b20a1fb9e813bbfa19093e533caf3b0c15
)

Since this has minimal changes from the previous RC, I would
secondarily propose that voting conclude in 72 hours from the RC2
announcement today morning.

Hive PMC Members: Please test and vote.


[jira] [Created] (HIVE-10697) ObjecInspectorConvertors#UnionConvertor does a faulty conversion

2015-05-12 Thread Swarnim Kulkarni (JIRA)
Swarnim Kulkarni created HIVE-10697:
---

 Summary: ObjecInspectorConvertors#UnionConvertor does a faulty 
conversion
 Key: HIVE-10697
 URL: https://issues.apache.org/jira/browse/HIVE-10697
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni


Currently the UnionConvertor in the ObjectInspectorConvertors class has an 
issue with the convert method where it attempts to convert the objectinspector 
itself instead of converting the field.[1] This should be changed to convert 
the field itself.

[1] 
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 34143: Fix stats annotation

2015-05-12 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34143/
---

Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Repository: hive-git


Description
---

This is a umbrella patch for a bunch of issues: HIVE-8769 Physical optimizer : 
Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not 
detected) HIVE-9392 JoinStatsRule miscalculates join cardinality as incorrect 
NDV is used due to column names having duplicated fqColumnName HIVE-10107 Union 
All : Vertex missing stats resulting in OOM and in-efficient plans


Diffs
-

  hbase-handler/src/test/results/positive/external_table_ppd.q.out 6d48edb 
  hbase-handler/src/test/results/positive/hbase_custom_key2.q.out c9b5a84 
  hbase-handler/src/test/results/positive/hbase_custom_key3.q.out 76848e0 
  hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 6174bfb 
  hbase-handler/src/test/results/positive/hbase_pushdown.q.out 8a979bf 
  hbase-handler/src/test/results/positive/hbase_queries.q.out 7863f69 
  hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3aae7d0 
  hbase-handler/src/test/results/positive/ppd_key_ranges.q.out 5936735 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 
0de7488 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 44269f0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 0a83440 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java c420190 
  ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java f66279f 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 508d880 
  ql/src/test/results/clientpositive/annotate_stats_filter.q.out e8cd06d 
  ql/src/test/results/clientpositive/annotate_stats_limit.q.out 5f8b6f8 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 753ab4e 
  ql/src/test/results/clientpositive/auto_join30.q.out b068493 
  ql/src/test/results/clientpositive/auto_join31.q.out 1e19dd0 
  ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 
  ql/src/test/results/clientpositive/auto_join_stats.q.out 9100762 
  ql/src/test/results/clientpositive/auto_join_stats2.q.out ed09875 
  ql/src/test/results/clientpositive/auto_join_without_localtask.q.out ce4ad8a 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd 
  ql/src/test/results/clientpositive/auto_sortmerge_join_14.q.out 43504d8 
  ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out afd5518 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 
  ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 
  ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out f039dda 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f 
  ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 65aa3ef 
  ql/src/test/results/clientpositive/binarysortable_1.q.out c4ba7e0 
  ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 
  ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 
  ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out eec099c 
  ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out 1a644a9 
  ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out e4f90e4 
  ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 307c83b 
  ql/src/test/results/clientpositive/column_access_stats.q.out a779564 
  ql/src/test/results/clientpositive/complex_alias.q.out 133ce91 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out 0eb1596 
  ql/src/test/results/clientpositive/correlationoptimizer10.q.out 3c3564d 
  ql/src/test/results/clientpositive/correlationoptimizer11.q.out bd86942 
  ql/src/test/results/clientpositive/correlationoptimizer15.q.out b57203e 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out 43d209f 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out 5389647 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out b350816 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out 6ba3462 
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out be518dc 
  ql/src/test/results/clientpositive/cross_product_check_2.q.out 500f912 
  ql/src/test/results/clientpositive/explain_logical.q.out 9b86ce8 
  ql/src/test/results/clientpositive/explain_rearrange.q.out c4a015e 
  ql/src/test/results/clientpositive/filter_numeric.q.out b6b8339 
  ql/src/test/results/clientpositive/fold_case.q.out de6c43e 

Re: Review Request 33992: HIVE-10657 Remove copyBytes operation from MD5 UDF

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33992/
---

(Updated May 12, 2015, 9:57 p.m.)


Review request for hive and Jason Dere.


Bugs: HIVE-10657
https://issues.apache.org/jira/browse/HIVE-10657


Repository: hive-git


Description
---

HIVE-10657 Remove copyBytes operation from MD5 UDF


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMd5.java 
62c16c23375eec96def5553404945dd963459850 

Diff: https://reviews.apache.org/r/33992/diff/


Testing
---


Thanks,

Alexander Pivovarov



[jira] [Created] (HIVE-10695) Hive Query Produces Wrong Result: PPD

2015-05-12 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-10695:
-

 Summary: Hive Query Produces Wrong Result: PPD
 Key: HIVE-10695
 URL: https://issues.apache.org/jira/browse/HIVE-10695
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
Priority: Critical
 Fix For: 0.14.1


Following query produces wrong result:
select * from t1 s left outer join  (select key, value from t1) f on 
s.key=f.key and s.value=f.value   left outer join  (select key, value from t1) 
c on s.key=c.key where f.key is null;

This is due to PPD gets confused between qualified col name  non qualified.
In many places in code column info doesn't include table alias which leads to 
PPD problem.

This is fixed in trunk as part of HIVE-9327 
https://issues.apache.org/jira/browse/HIVE-9327



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.2.0 release candidate 3

2015-05-12 Thread Gunther Hagleitner
+1


Checked signatures and release notes. Built from scratch and ran a few queries 
both with binaries and bins built from source. Looks good to me.


Thanks?,

Gunther.



From: Alan Gates alanfga...@gmail.com
Sent: Tuesday, May 12, 2015 1:16 PM
To: dev@hive.apache.org
Subject: Re: [VOTE] Apache Hive 1.2.0 release candidate 3

+1.  Checked LICENSE, NOTICE, README, and RELEASE_NOTES, signatures, looked for 
.class or .jar files, did a quick build.

Alan.

[cid:part1.05000604.06000908@gmail.com]
Sushanth Sowmyanmailto:khorg...@gmail.com
May 11, 2015 at 19:11
Hi Folks,

We've cleared all the blockers listed for 1.2.0 release, either
committing them, or deferring out to an eventual 1.2.1 stabilization
release. (Any deferrals were a result of discussion between myself and
the committer responsible for the issue.) More details are available
here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

Apache Hive 1.2.0 Release Candidate 2 is available here:

https://people.apache.org/~khorgath/releases/1.2.0_RC3/artifacts/

My public key used for signing is as available from the hive
committers key list : http://www.apache.org/dist/hive/KEYS

Maven artifacts are available here:

https://repository.apache.org/content/repositories/orgapachehive-1034

Source tag for RC3 is up on the apache git repo as tag
release-1.2.0-rc3 (Browseable view over at
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=826695b20a1fb9e813bbfa19093e533caf3b0c15
)

Since this has minimal changes from the previous RC, I would
secondarily propose that voting conclude in 72 hours from the RC2
announcement today morning.

Hive PMC Members: Please test and vote.


Re: Review Request 33937: HIVE-10641 create CRC32 UDF

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33937/
---

(Updated May 12, 2015, 10:18 p.m.)


Review request for hive and Jason Dere.


Changes
---

patch#2: use Text.getBytes() instead of toString()


Bugs: HIVE-10641
https://issues.apache.org/jira/browse/HIVE-10641


Repository: hive-git


Description
---

HIVE-10641 create CRC32 UDF


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
02a604ff0a4ed92dfd94b199e8b539f636b66f77 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCrc32.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFCrc32.java PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_crc32.q PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 
a422760400c62d026324dd667e4a632bfbe01b82 
  ql/src/test/results/clientpositive/udf_crc32.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/33937/diff/


Testing
---


Thanks,

Alexander Pivovarov



[jira] [Created] (HIVE-10696) TestAddResource tests are non-portable

2015-05-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-10696:


 Summary: TestAddResource tests are non-portable
 Key: HIVE-10696
 URL: https://issues.apache.org/jira/browse/HIVE-10696
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


We need to make sure these tests work in windows as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33927: HIVE-10639 create SHA1 UDF

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33927/
---

(Updated May 12, 2015, 10:11 p.m.)


Review request for hive and Jason Dere.


Changes
---

patch#3 use Text.getBytes() instead of toString()


Bugs: HIVE-10639
https://issues.apache.org/jira/browse/HIVE-10639


Repository: hive-git


Description
---

HIVE-10639 create SHA1 UDF


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
02a604ff0a4ed92dfd94b199e8b539f636b66f77 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSha1.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFSha1.java PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_sha1.q PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 
a422760400c62d026324dd667e4a632bfbe01b82 
  ql/src/test/results/clientpositive/udf_sha1.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/33927/diff/


Testing
---


Thanks,

Alexander Pivovarov



[jira] [Created] (HIVE-10694) LLAP: Add counters for time lost per query due to preemption

2015-05-12 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10694:
-

 Summary: LLAP: Add counters for time lost per query due to 
preemption
 Key: HIVE-10694
 URL: https://issues.apache.org/jira/browse/HIVE-10694
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and Hari Sankar Sivarama Subramaniyan

2015-05-12 Thread Lefty Leverenz
Congratulations!!!

-- Lefty

On Mon, May 11, 2015 at 8:01 PM, Hari Subramaniyan 
hsubramani...@hortonworks.com wrote:

 Thank you everyone and Congrats to Cheng and Dong!
 It's a great feeling and I will do my best to contribute to the Hive
 community.

 Cheers,
 Hari
 
 From: Xu, Cheng A cheng.a...@intel.com
 Sent: Monday, May 11, 2015 6:47 PM
 To: Thejas Nair; dev
 Cc: Chen, Dong1; Hari Subramaniyan
 Subject: RE: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and
 Hari Sankar Sivarama Subramaniyan

 Thanks guys. I will continue to strive for more contributions to HIVE
 project :)

 -Original Message-
 From: Thejas Nair [mailto:thejas.n...@gmail.com]
 Sent: Tuesday, May 12, 2015 7:52 AM
 To: dev
 Cc: Chen, Dong1; Xu, Cheng A; Hari Subramaniyan
 Subject: Re: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and
 Hari Sankar Sivarama Subramaniyan

 Congrats! Looking forward to more contributions (including code reviews)!


 On Mon, May 11, 2015 at 4:38 PM, Vikram Dixit K vikram.di...@gmail.com
 wrote:
  Congrats guys!
 
  On Mon, May 11, 2015 at 2:34 PM, Sushanth Sowmyan khorg...@gmail.com
 wrote:
  Congratulations, and thank you for your contributions! :)
 
  On Mon, May 11, 2015 at 2:17 PM, Sergio Pena sergio.p...@cloudera.com
 wrote:
  Congratulations Guys !!! :)
 
  On Mon, May 11, 2015 at 3:54 PM, Carl Steinbach c...@apache.org
 wrote:
 
  The Apache Hive PMC has voted to make Cheng Xu, Dong Chen, and Hari
  Sankar Sivarama Subramaniyan committers on the Apache Hive Project.
 
  Please join me in congratulating Cheng, Dong, and Hari!
 
  Thanks.
 
  - Carl
 
 
 
 
  --
  Nothing better than when appreciated for hard work.
  -Mark



Re: Review Request 33968: HIVE-10644 create SHA2 UDF

2015-05-12 Thread Alexander Pivovarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33968/
---

(Updated May 13, 2015, 5:48 a.m.)


Review request for hive and Jason Dere.


Changes
---

patch #2: use Text.getBytes() instead of toString()


Bugs: HIVE-10644
https://issues.apache.org/jira/browse/HIVE-10644


Repository: hive-git


Description
---

HIVE-10644 create SHA2 UDF


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
02a604ff0a4ed92dfd94b199e8b539f636b66f77 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 
b043bdc882af7c0b83787526a5a55c9dc29c6681 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSha2.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSha2.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_sha2.q PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 
a422760400c62d026324dd667e4a632bfbe01b82 
  ql/src/test/results/clientpositive/udf_sha2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/33968/diff/


Testing
---


Thanks,

Alexander Pivovarov



[jira] [Created] (HIVE-10684) Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files

2015-05-12 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10684:
---

 Summary: Fix the UT failures for HIVE7553 after HIVE-10674 removed 
the binary jar files
 Key: HIVE-10684
 URL: https://issues.apache.org/jira/browse/HIVE-10684
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-05-12 Thread guoliming (JIRA)
guoliming created HIVE-10685:


 Summary: Alter table concatenate oparetor will cause duplicate data
 Key: HIVE-10685
 URL: https://issues.apache.org/jira/browse/HIVE-10685
 Project: Hive
  Issue Type: Bug
Reporter: guoliming


Orders table has 15 rows and stored as ORC. 

hive select count(*) from orders;
OK
15
Time taken: 37.692 seconds, Fetched: 1 row(s)

The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.

After executing command : ALTER TABLE orders CONCATENATE;
The table is already 1530115000 rows.

My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10686) java.lang.IndexOutOfBoundsException for query with rank() over(partition ...) on CBO

2015-05-12 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-10686:
--

 Summary: java.lang.IndexOutOfBoundsException for query with rank() 
over(partition ...) on CBO
 Key: HIVE-10686
 URL: https://issues.apache.org/jira/browse/HIVE-10686
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


CBO throws Index out of bound exception for TPC-DS Q70.

Query 
{code}

explain
select
sum(ss_net_profit) as total_sum
   ,s_state
   ,s_county
   ,grouping__id as lochierarchy
   , rank() over(partition by grouping__id, case when grouping__id == 2 then 
s_state end order by sum(ss_net_profit)) as rank_within_parent
from
store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk
join store s on s.s_store_sk  = ss.ss_store_sk
 where
d1.d_month_seq between 1193 and 1193+11
 and s.s_state in
 ( select s_state
   from  (select s_state as s_state, sum(ss_net_profit),
 rank() over ( partition by s_state order by 
sum(ss_net_profit) desc) as ranking
  from   store_sales, store, date_dim
  where  d_month_seq between 1193 and 1193+11
and date_dim.d_date_sk = store_sales.ss_sold_date_sk
and store.s_store_sk  = store_sales.ss_store_sk
  group by s_state
 ) tmp1
   where ranking = 5
 )
 group by s_state,s_county with rollup
order by
   lochierarchy desc
  ,case when lochierarchy = 0 then s_state end
  ,rank_within_parent
 limit 100
{code}

Original plan (correct)
{code}
 HiveSort(fetch=[100])
  HiveSort(sort0=[$3], sort1=[$5], sort2=[$4], dir0=[DESC], dir1=[ASC], 
dir2=[ASC])
HiveProject(total_sum=[$4], s_state=[$0], s_county=[$1], lochierarchy=[$5], 
rank_within_parent=[rank() OVER (PARTITION BY $5, when(==($5, 2), $0) ORDER BY 
$4 ROWS BETWEEN 2147483647 FOLLOWING AND 2147483647 PRECEDING)], (tok_function 
when (= (tok_table_or_col lochierarchy) 0) (tok_table_or_col 
s_state))=[when(=($5, 0), $0)])
  HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], 
indicator=[true], agg#0=[sum($2)], GROUPING__ID=[GROUPING__ID()])
HiveProject($f0=[$7], $f1=[$6], $f2=[$1])
  HiveJoin(condition=[=($5, $2)], joinType=[inner], algorithm=[none], 
cost=[{1177.2086187101072 rows, 0.0 cpu, 0.0 io}])
HiveJoin(condition=[=($3, $0)], joinType=[inner], algorithm=[none], 
cost=[{2880430.428726483 rows, 0.0 cpu, 0.0 io}])
  HiveProject(ss_sold_date_sk=[$0], ss_net_profit=[$21], 
ss_store_sk=[$22])
HiveTableScan(table=[[tpcds.store_sales]])
  HiveProject(d_date_sk=[$0], d_month_seq=[$3])
HiveFilter(condition=[between(false, $3, 1193, +(1193, 11))])
  HiveTableScan(table=[[tpcds.date_dim]])
HiveProject(s_store_sk=[$0], s_county=[$1], s_state=[$2])
  SemiJoin(condition=[=($2, $3)], joinType=[inner])
HiveProject(s_store_sk=[$0], s_county=[$23], s_state=[$24])
  HiveTableScan(table=[[tpcds.store]])
HiveProject(s_state=[$0])
  HiveFilter(condition=[=($1, 5)])
HiveProject((tok_table_or_col s_state)=[$0], 
rank_window_0=[rank() OVER (PARTITION BY $0 ORDER BY $1 DESC ROWS BETWEEN 
2147483647 FOLLOWING AND 2147483647 PRECEDING)])
  HiveAggregate(group=[{0}], agg#0=[sum($1)])
HiveProject($f0=[$6], $f1=[$1])
  HiveJoin(condition=[=($5, $2)], joinType=[inner], 
algorithm=[none], cost=[{1177.2086187101072 rows, 0.0 cpu, 0.0 io}])
HiveJoin(condition=[=($3, $0)], joinType=[inner], 
algorithm=[none], cost=[{2880430.428726483 rows, 0.0 cpu, 0.0 io}])
  HiveProject(ss_sold_date_sk=[$0], 
ss_net_profit=[$21], ss_store_sk=[$22])
HiveTableScan(table=[[tpcds.store_sales]])
  HiveProject(d_date_sk=[$0], d_month_seq=[$3])
HiveFilter(condition=[between(false, $3, 1193, 
+(1193, 11))])
  HiveTableScan(table=[[tpcds.date_dim]])
HiveProject(s_store_sk=[$0], s_state=[$24])
  HiveTableScan(table=[[tpcds.store]])
{code}

Plan after fixTopOBSchema (incorrect)

{code}
 HiveSort(fetch=[100])
  HiveSort(sort0=[$3], sort1=[$5], sort2=[$4], dir0=[DESC], dir1=[ASC], 
dir2=[ASC])
HiveProject(total_sum=[$4], s_state=[$0], s_county=[$1], lochierarchy=[$5], 
rank_within_parent=[rank() OVER (PARTITION BY $5, when(==($5, 2), $0) ORDER BY 
$4 ROWS BETWEEN 2147483647 FOLLOWING AND 2147483647 PRECEDING)])
  HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], 
indicator=[true], 

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

2015-05-12 Thread Thejas Nair
+1
This is great for development of new features in hive and making them
available to useres. This also helps users who are slow to move to new
version of hadoop, they can still bug fixes and features compatible
with hadoop 1 in new hive 1.x releases.

It will also be easier for users to remember what hadoop version works
with what version of hive. (Hive 1.x needs hadoop 1+, hive 2.x needs).


On Mon, May 11, 2015 at 10:01 PM, Prasanth Jayachandran
pjayachand...@hortonworks.com wrote:
 +1 for the proposal. New branch definitely helps us moving forward quickly 
 with new features and deprecating the old stuffs (20S shims and mapreduce).

 Thanks
 Prasanth




 On Mon, May 11, 2015 at 7:20 PM -0700, Vikram Dixit K 
 vikram.di...@gmail.commailto:vikram.di...@gmail.com wrote:

 The proposal sounds good. Supporting and maintaining
 hadoop-1 is hard and conflict in API changes in 2.x of hadoop keeps us
 from using new and better APIs as it breaks compilation.

 +1

 Thanks
 Vikram.

 On Mon, May 11, 2015 at 7:17 PM, Sergey Shelukhin
 ser...@hortonworks.com wrote:
 That sounds like a good idea.
 Some features could be back ported to branch-1 if viable, but at least new
 stuff would not be burdened by Hadoop 1/MR code paths.
 Probably also a good place to enable vectorization and other perf features
 by default while we make alpha releases.

 +1

 On 15/5/11, 15:38, Alan Gates alanfga...@gmail.com wrote:

There is a lot of forward-looking work going on in various branches of
Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
would be good to have a way to release this code to users so that they
can experiment with it.  Releasing it will also provide feedback to
developers.

At the same time there are discussions on whether to keep supporting
Hadoop-1.  The burden of supporting older, less used functionality such
as Hadoop-1 is becoming ever harder as many new features are added.

I propose that the best way to deal with this would be to make a
branch-1.  We could continue to make new feature releases off of this
branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
This provides stability and continuity for users and developers.

We could then merge these new features branches (LLAP, HBase metastore,
CLI drop) into the trunk, as well as turn on by default newer features
such as the vectorization and ACID.  We could also drop older, less used
features such as support for Hadoop-1 and MapReduce.  It will be a while
before we are ready to make stable, production ready releases of this
code.  But we could start making alpha quality releases soon.  We would
call these releases 2.x, to stress the non-backward compatible changes
such as dropping Hadoop-1.  This will give users a chance to play with
the new code and developers a chance to get feedback.

Thoughts?




 --
 Nothing better than when appreciated for hard work.
 -Mark