Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83359 --- ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134334 booleans in java are false by default ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134335 Objects are null by default in Java ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java https://reviews.apache.org/r/34059/#comment134336 It is not necessary but I do not see a reason why the visibility of this method should be reduced. Should it be public as all others? - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83362 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java https://reviews.apache.org/r/34059/#comment134342 usually static Log should be private because superclass static methods should use their own static Log to avoid confusion. ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java https://reviews.apache.org/r/34059/#comment134340 Can you use Map.Entry to avoid unnecesary lookup 3 lines below? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java https://reviews.apache.org/r/34059/#comment134343 ReduceSinkOperator uses Object.hashCode() and equals() methods. HashSet algo relies on hashCode/equals methods - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83367 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java https://reviews.apache.org/r/34059/#comment134344 trailing space ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java https://reviews.apache.org/r/34059/#comment134347 Why calling getEntry(key) two times consequently? containsKey() and get() call getEntry internally Just call get(rs) one time, check thet result is not null and remove the second get(rs) - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
[jira] [Created] (HIVE-10682) LLAP: Make use of the task runner which allows killing tasks
Siddharth Seth created HIVE-10682: - Summary: LLAP: Make use of the task runner which allows killing tasks Key: HIVE-10682 URL: https://issues.apache.org/jira/browse/HIVE-10682 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap TEZ-2434 adds a runner which allows tasks to be killed. Jira to integrate with that without the actual kill functionality. That will follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10683) LLAP: Add a mechanism for daemons to inform the AM about killed tasks
Siddharth Seth created HIVE-10683: - Summary: LLAP: Add a mechanism for daemons to inform the AM about killed tasks Key: HIVE-10683 URL: https://issues.apache.org/jira/browse/HIVE-10683 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/#review83371 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java https://reviews.apache.org/r/34059/#comment134348 trailing space ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java https://reviews.apache.org/r/34059/#comment134349 Java will set it to 0 in constructor anyway. ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java https://reviews.apache.org/r/34059/#comment134350 Remove this line and add String type declaration 3 lines below. Do not confuse GC. ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java https://reviews.apache.org/r/34059/#comment134351 it will be false by default - Alexander Pivovarov On May 11, 2015, 9:48 p.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated May 11, 2015, 9:48 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java eff4d30 itests/src/test/resources/testconfiguration.properties eeb46cc ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java b1352f3 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 15c747e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java a9082eb ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42b643 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java f7e1dbc ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 241e9d7 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
[jira] [Created] (HIVE-10687) AvroDeserializer fails to deserialize evolved union fields
Swarnim Kulkarni created HIVE-10687: --- Summary: AvroDeserializer fails to deserialize evolved union fields Key: HIVE-10687 URL: https://issues.apache.org/jira/browse/HIVE-10687 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Consider the union field: union {int, string} and now this field evolves to union {null, int, string}. Running it through the avro schema compatibility check[1], they are actually compatible which means that the latter could be used to deserialize the data written with former. However the avro deserializer fails to do that. Mainly because of the way it reads the tags from the reader schema and then reds the corresponding data from the writer schema. [2] [1] http://pastebin.cerner.corp/31078 [2] https://github.com/cloudera/hive/blob/cdh5.4.0-release/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java#L354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10688) constant folding is broken for case-when udf
Jagruti Varia created HIVE-10688: Summary: constant folding is broken for case-when udf Key: HIVE-10688 URL: https://issues.apache.org/jira/browse/HIVE-10688 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Jagruti Varia Assignee: Ashutosh Chauhan Fix For: 1.2.0 In some cases, case-when udf throws IndexOutOfBoundsException as shown below: {noformat} FAILED: IndexOutOfBoundsException Index: 2, Size: 2 java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.shortcutFunction(ConstantPropagateProcFactory.java:428) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:238) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:227) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:98) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:679) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
[jira] [Created] (HIVE-10690) ArrayIndexOutOfBounds exception in MetaStoreDirectSql.aggrColStatsForPartitions()
Jason Dere created HIVE-10690: - Summary: ArrayIndexOutOfBounds exception in MetaStoreDirectSql.aggrColStatsForPartitions() Key: HIVE-10690 URL: https://issues.apache.org/jira/browse/HIVE-10690 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Noticed a bunch of these stack traces in hive.log while running some unit tests: {noformat} 2015-05-11 21:18:59,371 WARN [main]: metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2420)) - Direct SQL failed java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.aggrColStatsForPartitions(MetaStoreDirectSql.java:1132) at org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:6162) at org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:6158) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2385) at org.apache.hadoop.hive.metastore.ObjectStore.get_aggr_stats_for(ObjectStore.java:6158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) at com.sun.proxy.$Proxy84.get_aggr_stats_for(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_aggr_stats_for(HiveMetaStore.java:5662) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy86.get_aggr_stats_for(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAggrColStatsFor(HiveMetaStoreClient.java:2064) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy87.getAggrColStatsFor(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getAggrColStatsFor(Hive.java:3110) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:245) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.updateColStats(RelOptHiveTable.java:329) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getColStat(RelOptHiveTable.java:399) at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getColStat(RelOptHiveTable.java:392) at org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan.getColStat(HiveTableScan.java:150) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:77) at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:64) at sun.reflect.GeneratedMethodAccessor296.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182) at com.sun.proxy.$Proxy108.getDistinctRowCount(Unknown Source) at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109) at com.sun.proxy.$Proxy108.getDistinctRowCount(Unknown Source) at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
[jira] [Created] (HIVE-10691) Fix incorrect text in README about not support inserts and updates
Alan Gates created HIVE-10691: - Summary: Fix incorrect text in README about not support inserts and updates Key: HIVE-10691 URL: https://issues.apache.org/jira/browse/HIVE-10691 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 1.0.0, 0.14.0, 1.2.0, 1.1.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor The README says {quote} Hive is not designed for online transaction processing and does not support row level insert/updates. {quote} This is not true. As of Hive 0.14 it does support row level insert and updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10689) HS2 metadata api calls should be authorized via HiveAuthorizer
Thejas M Nair created HIVE-10689: Summary: HS2 metadata api calls should be authorized via HiveAuthorizer Key: HIVE-10689 URL: https://issues.apache.org/jira/browse/HIVE-10689 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair java.sql.DataBaseMetadata apis in jdbc api result in calls to HS2 metadata api's and their execution is via separate Hive Operation implementations, that don't use the Hive Driver class. Invocation of these api's should also be authorized using the HiveAuthorizer api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10692) DAGs get stuck at start with no tasks executing
Sergey Shelukhin created HIVE-10692: --- Summary: DAGs get stuck at start with no tasks executing Key: HIVE-10692 URL: https://issues.apache.org/jira/browse/HIVE-10692 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth Internal app ID application_1429683757595_0914, LLAP application_1429683757595_0913. If someone without access wants to investigate I'll get the logs. 2nd dag failed to start executing: http://cn043-10.l42scl.hortonworks.com:8042/node/containerlogs/container_1429683757595_0914_01_01/sershe/syslog_dag_1429683757595_0914_2/?start=-65536 After many S_TA_LAUNCH_REQUEST-s, the following is logged and after that there's no more logging aside from refreshes until I killed the DAG. LLAP daemons were idling meanwhile. {noformat} 2015-05-12 13:52:08,997 INFO [TaskSchedulerEventHandlerThread] rm.TaskSchedulerEventHandler: Processing the event EventType: S_TA_LAUNCH_REQUEST 2015-05-12 13:52:18,507 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 556007888 2015-05-12 13:52:25,315 INFO [HistoryEventHandlingThread] ats.ATSHistoryLoggingService: Event queue stats, eventsProcessedSinceLastUpdate=407, eventQueueSize=614 2015-05-12 13:52:28,507 INFO [LlapSchedulerNodeEnabler] impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 556007888 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10693) LLAP: DAG got stuck after reducer fetch failed
Sergey Shelukhin created HIVE-10693: --- Summary: LLAP: DAG got stuck after reducer fetch failed Key: HIVE-10693 URL: https://issues.apache.org/jira/browse/HIVE-10693 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth Internal app ID application_1429683757595_0912, LLAP application_1429683757595_0911. If someone without access wants to investigate I'll get the logs. I've ran into this only once. Feel free to close as not repro, I'll reopen if I see again :) I want to make sure some debug info is preserved just in case. Running Q1 - Map 1 w/1000 tasks (in this particular case), followed by Reducer 2 and Reducer 3, 1 task each, IIRC 3 is uber. Fetch failed with I'd assume some random disturbance in the force: {noformat} 2015-05-12 13:37:31,056 [fetcher [Map_1] #17()] WARN org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped: Failed to verify reply after connecting to cn047-10.l42scl.hortonworks.com:15551 with 1 inputs pending java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.$$YJP$$socketRead0(Native Method) at java.net.SocketInputStream.socketRead0(SocketInputStream.java) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:703) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:787) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439) at org.apache.tez.runtime.library.common.shuffle.HttpConnection.getInputStream(HttpConnection.java:256) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.setupConnection(FetcherOrderedGrouped.java:339) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:257) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:167) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:182) {noformat} AM registered this as Map 1 task failure {noformat} 2015-05-12 13:37:31,156 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: attempt_1429683757595_0912_1_00_000998_0 blamed for read error from attempt_1429683757595_0912_1_01_00_0 at inputIndex 998 ... 2015-05-12 13:37:31,174 INFO [Dispatcher thread: Central] impl.TaskImpl: Scheduling new attempt for task: task_1429683757595_0912_1_00_000998, currentFailedAttempts: 1, maxFailedAttempts: 4 {noformat} Eventually Map 1 completed {noformat} 2015-05-12 13:38:25,247 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1429683757595_0912_1][Event:VERTEX_FINISHED]: vertexName=Map 1, vertexId=vertex_1429683757595_0912_1_00, initRequestedTime=1431462752913, initedTime=1431462754818, startRequestedTime=1431462754819, startedTime=1431462754819, finishTime=1431463105101, timeTaken=350282, status=SUCCEEDED, diagnostics=, counters=Counters: 29, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=59, RACK_LOCAL_TASKS=941, File System Counters, FILE_BYTES_READ=2160704, FILE_BYTES_WRITTEN=20377550, FILE_READ_OPS=0, FILE_LARGE_READ_OPS=0, FILE_WRITE_OPS=0, HDFS_BYTES_READ=9798097828287, HDFS_BYTES_WRITTEN=0, HDFS_READ_OPS=406131, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=4000, GC_TIME_MILLIS=73309, CPU_MILLISECONDS=0, PHYSICAL_MEMORY_BYTES=-1000, VIRTUAL_MEMORY_BYTES=-1000, COMMITTED_HEAP_BYTES=25769803776000, INPUT_RECORDS_PROCESSED=5861038, OUTPUT_RECORDS=4000, OUTPUT_BYTES=376000, OUTPUT_BYTES_WITH_OVERHEAD=0, OUTPUT_BYTES_PHYSICAL=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, HIVE, DESERIALIZE_ERRORS=0, RECORDS_IN_Map_1=589709, RECORDS_OUT_INTERMEDIATE_Map_1=4000, vertexStats=firstTaskStartTime=1431462757804, firstTasksToStart=[ task_1429683757595_0912_1_00_00 ], lastTaskFinishTime=1431463105085, lastTasksToFinish=[ task_1429683757595_0912_1_00_000999 ], minTaskDuration=1743, maxTaskDuration=236653, avgTaskDuration=6377.3342, numSuccessfulTasks=1000, shortestDurationTasks=[
Re: [VOTE] Apache Hive 1.2.0 release candidate 3
+1. Checked LICENSE, NOTICE, README, and RELEASE_NOTES, signatures, looked for .class or .jar files, did a quick build. Alan. Sushanth Sowmyan mailto:khorg...@gmail.com May 11, 2015 at 19:11 Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 2 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC3/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1034 Source tag for RC3 is up on the apache git repo as tag release-1.2.0-rc3 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=826695b20a1fb9e813bbfa19093e533caf3b0c15 ) Since this has minimal changes from the previous RC, I would secondarily propose that voting conclude in 72 hours from the RC2 announcement today morning. Hive PMC Members: Please test and vote.
[jira] [Created] (HIVE-10697) ObjecInspectorConvertors#UnionConvertor does a faulty conversion
Swarnim Kulkarni created HIVE-10697: --- Summary: ObjecInspectorConvertors#UnionConvertor does a faulty conversion Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1] This should be changed to convert the field itself. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34143: Fix stats annotation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34143/ --- Review request for hive, Ashutosh Chauhan and John Pullokkaran. Repository: hive-git Description --- This is a umbrella patch for a bunch of issues: HIVE-8769 Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected) HIVE-9392 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName HIVE-10107 Union All : Vertex missing stats resulting in OOM and in-efficient plans Diffs - hbase-handler/src/test/results/positive/external_table_ppd.q.out 6d48edb hbase-handler/src/test/results/positive/hbase_custom_key2.q.out c9b5a84 hbase-handler/src/test/results/positive/hbase_custom_key3.q.out 76848e0 hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 6174bfb hbase-handler/src/test/results/positive/hbase_pushdown.q.out 8a979bf hbase-handler/src/test/results/positive/hbase_queries.q.out 7863f69 hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3aae7d0 hbase-handler/src/test/results/positive/ppd_key_ranges.q.out 5936735 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 0de7488 ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java 44269f0 ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 0a83440 ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java c420190 ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java f66279f ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 508d880 ql/src/test/results/clientpositive/annotate_stats_filter.q.out e8cd06d ql/src/test/results/clientpositive/annotate_stats_limit.q.out 5f8b6f8 ql/src/test/results/clientpositive/annotate_stats_select.q.out 753ab4e ql/src/test/results/clientpositive/auto_join30.q.out b068493 ql/src/test/results/clientpositive/auto_join31.q.out 1e19dd0 ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 ql/src/test/results/clientpositive/auto_join_stats.q.out 9100762 ql/src/test/results/clientpositive/auto_join_stats2.q.out ed09875 ql/src/test/results/clientpositive/auto_join_without_localtask.q.out ce4ad8a ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd ql/src/test/results/clientpositive/auto_sortmerge_join_14.q.out 43504d8 ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out afd5518 ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out f039dda ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 65aa3ef ql/src/test/results/clientpositive/binarysortable_1.q.out c4ba7e0 ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out eec099c ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out 1a644a9 ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out e4f90e4 ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 307c83b ql/src/test/results/clientpositive/column_access_stats.q.out a779564 ql/src/test/results/clientpositive/complex_alias.q.out 133ce91 ql/src/test/results/clientpositive/correlationoptimizer1.q.out 0eb1596 ql/src/test/results/clientpositive/correlationoptimizer10.q.out 3c3564d ql/src/test/results/clientpositive/correlationoptimizer11.q.out bd86942 ql/src/test/results/clientpositive/correlationoptimizer15.q.out b57203e ql/src/test/results/clientpositive/correlationoptimizer2.q.out 43d209f ql/src/test/results/clientpositive/correlationoptimizer3.q.out 5389647 ql/src/test/results/clientpositive/correlationoptimizer4.q.out b350816 ql/src/test/results/clientpositive/correlationoptimizer5.q.out 6ba3462 ql/src/test/results/clientpositive/correlationoptimizer6.q.out be518dc ql/src/test/results/clientpositive/cross_product_check_2.q.out 500f912 ql/src/test/results/clientpositive/explain_logical.q.out 9b86ce8 ql/src/test/results/clientpositive/explain_rearrange.q.out c4a015e ql/src/test/results/clientpositive/filter_numeric.q.out b6b8339 ql/src/test/results/clientpositive/fold_case.q.out de6c43e
Re: Review Request 33992: HIVE-10657 Remove copyBytes operation from MD5 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33992/ --- (Updated May 12, 2015, 9:57 p.m.) Review request for hive and Jason Dere. Bugs: HIVE-10657 https://issues.apache.org/jira/browse/HIVE-10657 Repository: hive-git Description --- HIVE-10657 Remove copyBytes operation from MD5 UDF Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMd5.java 62c16c23375eec96def5553404945dd963459850 Diff: https://reviews.apache.org/r/33992/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10695) Hive Query Produces Wrong Result: PPD
Laljo John Pullokkaran created HIVE-10695: - Summary: Hive Query Produces Wrong Result: PPD Key: HIVE-10695 URL: https://issues.apache.org/jira/browse/HIVE-10695 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Priority: Critical Fix For: 0.14.1 Following query produces wrong result: select * from t1 s left outer join (select key, value from t1) f on s.key=f.key and s.value=f.value left outer join (select key, value from t1) c on s.key=c.key where f.key is null; This is due to PPD gets confused between qualified col name non qualified. In many places in code column info doesn't include table alias which leads to PPD problem. This is fixed in trunk as part of HIVE-9327 https://issues.apache.org/jira/browse/HIVE-9327 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.2.0 release candidate 3
+1 Checked signatures and release notes. Built from scratch and ran a few queries both with binaries and bins built from source. Looks good to me. Thanks?, Gunther. From: Alan Gates alanfga...@gmail.com Sent: Tuesday, May 12, 2015 1:16 PM To: dev@hive.apache.org Subject: Re: [VOTE] Apache Hive 1.2.0 release candidate 3 +1. Checked LICENSE, NOTICE, README, and RELEASE_NOTES, signatures, looked for .class or .jar files, did a quick build. Alan. [cid:part1.05000604.06000908@gmail.com] Sushanth Sowmyanmailto:khorg...@gmail.com May 11, 2015 at 19:11 Hi Folks, We've cleared all the blockers listed for 1.2.0 release, either committing them, or deferring out to an eventual 1.2.1 stabilization release. (Any deferrals were a result of discussion between myself and the committer responsible for the issue.) More details are available here : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status Apache Hive 1.2.0 Release Candidate 2 is available here: https://people.apache.org/~khorgath/releases/1.2.0_RC3/artifacts/ My public key used for signing is as available from the hive committers key list : http://www.apache.org/dist/hive/KEYS Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1034 Source tag for RC3 is up on the apache git repo as tag release-1.2.0-rc3 (Browseable view over at https://git-wip-us.apache.org/repos/asf?p=hive.git;a=tag;h=826695b20a1fb9e813bbfa19093e533caf3b0c15 ) Since this has minimal changes from the previous RC, I would secondarily propose that voting conclude in 72 hours from the RC2 announcement today morning. Hive PMC Members: Please test and vote.
Re: Review Request 33937: HIVE-10641 create CRC32 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33937/ --- (Updated May 12, 2015, 10:18 p.m.) Review request for hive and Jason Dere. Changes --- patch#2: use Text.getBytes() instead of toString() Bugs: HIVE-10641 https://issues.apache.org/jira/browse/HIVE-10641 Repository: hive-git Description --- HIVE-10641 create CRC32 UDF Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 02a604ff0a4ed92dfd94b199e8b539f636b66f77 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCrc32.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFCrc32.java PRE-CREATION ql/src/test/queries/clientpositive/udf_crc32.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a422760400c62d026324dd667e4a632bfbe01b82 ql/src/test/results/clientpositive/udf_crc32.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33937/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10696) TestAddResource tests are non-portable
Hari Sankar Sivarama Subramaniyan created HIVE-10696: Summary: TestAddResource tests are non-portable Key: HIVE-10696 URL: https://issues.apache.org/jira/browse/HIVE-10696 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan We need to make sure these tests work in windows as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33927: HIVE-10639 create SHA1 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33927/ --- (Updated May 12, 2015, 10:11 p.m.) Review request for hive and Jason Dere. Changes --- patch#3 use Text.getBytes() instead of toString() Bugs: HIVE-10639 https://issues.apache.org/jira/browse/HIVE-10639 Repository: hive-git Description --- HIVE-10639 create SHA1 UDF Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 02a604ff0a4ed92dfd94b199e8b539f636b66f77 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSha1.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFSha1.java PRE-CREATION ql/src/test/queries/clientpositive/udf_sha1.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a422760400c62d026324dd667e4a632bfbe01b82 ql/src/test/results/clientpositive/udf_sha1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33927/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10694) LLAP: Add counters for time lost per query due to preemption
Siddharth Seth created HIVE-10694: - Summary: LLAP: Add counters for time lost per query due to preemption Key: HIVE-10694 URL: https://issues.apache.org/jira/browse/HIVE-10694 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and Hari Sankar Sivarama Subramaniyan
Congratulations!!! -- Lefty On Mon, May 11, 2015 at 8:01 PM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: Thank you everyone and Congrats to Cheng and Dong! It's a great feeling and I will do my best to contribute to the Hive community. Cheers, Hari From: Xu, Cheng A cheng.a...@intel.com Sent: Monday, May 11, 2015 6:47 PM To: Thejas Nair; dev Cc: Chen, Dong1; Hari Subramaniyan Subject: RE: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and Hari Sankar Sivarama Subramaniyan Thanks guys. I will continue to strive for more contributions to HIVE project :) -Original Message- From: Thejas Nair [mailto:thejas.n...@gmail.com] Sent: Tuesday, May 12, 2015 7:52 AM To: dev Cc: Chen, Dong1; Xu, Cheng A; Hari Subramaniyan Subject: Re: [ANNOUNCE] New Hive Committers - Cheng Xu, Dong Chen, and Hari Sankar Sivarama Subramaniyan Congrats! Looking forward to more contributions (including code reviews)! On Mon, May 11, 2015 at 4:38 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Congrats guys! On Mon, May 11, 2015 at 2:34 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Congratulations, and thank you for your contributions! :) On Mon, May 11, 2015 at 2:17 PM, Sergio Pena sergio.p...@cloudera.com wrote: Congratulations Guys !!! :) On Mon, May 11, 2015 at 3:54 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Cheng Xu, Dong Chen, and Hari Sankar Sivarama Subramaniyan committers on the Apache Hive Project. Please join me in congratulating Cheng, Dong, and Hari! Thanks. - Carl -- Nothing better than when appreciated for hard work. -Mark
Re: Review Request 33968: HIVE-10644 create SHA2 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33968/ --- (Updated May 13, 2015, 5:48 a.m.) Review request for hive and Jason Dere. Changes --- patch #2: use Text.getBytes() instead of toString() Bugs: HIVE-10644 https://issues.apache.org/jira/browse/HIVE-10644 Repository: hive-git Description --- HIVE-10644 create SHA2 UDF Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 02a604ff0a4ed92dfd94b199e8b539f636b66f77 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java b043bdc882af7c0b83787526a5a55c9dc29c6681 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSha2.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSha2.java PRE-CREATION ql/src/test/queries/clientpositive/udf_sha2.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a422760400c62d026324dd667e4a632bfbe01b82 ql/src/test/results/clientpositive/udf_sha2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33968/diff/ Testing --- Thanks, Alexander Pivovarov
[jira] [Created] (HIVE-10684) Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files
Ferdinand Xu created HIVE-10684: --- Summary: Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files Key: HIVE-10684 URL: https://issues.apache.org/jira/browse/HIVE-10684 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
guoliming created HIVE-10685: Summary: Alter table concatenate oparetor will cause duplicate data Key: HIVE-10685 URL: https://issues.apache.org/jira/browse/HIVE-10685 Project: Hive Issue Type: Bug Reporter: guoliming Orders table has 15 rows and stored as ORC. hive select count(*) from orders; OK 15 Time taken: 37.692 seconds, Fetched: 1 row(s) The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. After executing command : ALTER TABLE orders CONCATENATE; The table is already 1530115000 rows. My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10686) java.lang.IndexOutOfBoundsException for query with rank() over(partition ...) on CBO
Jesus Camacho Rodriguez created HIVE-10686: -- Summary: java.lang.IndexOutOfBoundsException for query with rank() over(partition ...) on CBO Key: HIVE-10686 URL: https://issues.apache.org/jira/browse/HIVE-10686 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez CBO throws Index out of bound exception for TPC-DS Q70. Query {code} explain select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping__id as lochierarchy , rank() over(partition by grouping__id, case when grouping__id == 2 then s_state end order by sum(ss_net_profit)) as rank_within_parent from store_sales ss join date_dim d1 on d1.d_date_sk = ss.ss_sold_date_sk join store s on s.s_store_sk = ss.ss_store_sk where d1.d_month_seq between 1193 and 1193+11 and s.s_state in ( select s_state from (select s_state as s_state, sum(ss_net_profit), rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1193 and 1193+11 and date_dim.d_date_sk = store_sales.ss_sold_date_sk and store.s_store_sk = store_sales.ss_store_sk group by s_state ) tmp1 where ranking = 5 ) group by s_state,s_county with rollup order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent limit 100 {code} Original plan (correct) {code} HiveSort(fetch=[100]) HiveSort(sort0=[$3], sort1=[$5], sort2=[$4], dir0=[DESC], dir1=[ASC], dir2=[ASC]) HiveProject(total_sum=[$4], s_state=[$0], s_county=[$1], lochierarchy=[$5], rank_within_parent=[rank() OVER (PARTITION BY $5, when(==($5, 2), $0) ORDER BY $4 ROWS BETWEEN 2147483647 FOLLOWING AND 2147483647 PRECEDING)], (tok_function when (= (tok_table_or_col lochierarchy) 0) (tok_table_or_col s_state))=[when(=($5, 0), $0)]) HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], indicator=[true], agg#0=[sum($2)], GROUPING__ID=[GROUPING__ID()]) HiveProject($f0=[$7], $f1=[$6], $f2=[$1]) HiveJoin(condition=[=($5, $2)], joinType=[inner], algorithm=[none], cost=[{1177.2086187101072 rows, 0.0 cpu, 0.0 io}]) HiveJoin(condition=[=($3, $0)], joinType=[inner], algorithm=[none], cost=[{2880430.428726483 rows, 0.0 cpu, 0.0 io}]) HiveProject(ss_sold_date_sk=[$0], ss_net_profit=[$21], ss_store_sk=[$22]) HiveTableScan(table=[[tpcds.store_sales]]) HiveProject(d_date_sk=[$0], d_month_seq=[$3]) HiveFilter(condition=[between(false, $3, 1193, +(1193, 11))]) HiveTableScan(table=[[tpcds.date_dim]]) HiveProject(s_store_sk=[$0], s_county=[$1], s_state=[$2]) SemiJoin(condition=[=($2, $3)], joinType=[inner]) HiveProject(s_store_sk=[$0], s_county=[$23], s_state=[$24]) HiveTableScan(table=[[tpcds.store]]) HiveProject(s_state=[$0]) HiveFilter(condition=[=($1, 5)]) HiveProject((tok_table_or_col s_state)=[$0], rank_window_0=[rank() OVER (PARTITION BY $0 ORDER BY $1 DESC ROWS BETWEEN 2147483647 FOLLOWING AND 2147483647 PRECEDING)]) HiveAggregate(group=[{0}], agg#0=[sum($1)]) HiveProject($f0=[$6], $f1=[$1]) HiveJoin(condition=[=($5, $2)], joinType=[inner], algorithm=[none], cost=[{1177.2086187101072 rows, 0.0 cpu, 0.0 io}]) HiveJoin(condition=[=($3, $0)], joinType=[inner], algorithm=[none], cost=[{2880430.428726483 rows, 0.0 cpu, 0.0 io}]) HiveProject(ss_sold_date_sk=[$0], ss_net_profit=[$21], ss_store_sk=[$22]) HiveTableScan(table=[[tpcds.store_sales]]) HiveProject(d_date_sk=[$0], d_month_seq=[$3]) HiveFilter(condition=[between(false, $3, 1193, +(1193, 11))]) HiveTableScan(table=[[tpcds.date_dim]]) HiveProject(s_store_sk=[$0], s_state=[$24]) HiveTableScan(table=[[tpcds.store]]) {code} Plan after fixTopOBSchema (incorrect) {code} HiveSort(fetch=[100]) HiveSort(sort0=[$3], sort1=[$5], sort2=[$4], dir0=[DESC], dir1=[ASC], dir2=[ASC]) HiveProject(total_sum=[$4], s_state=[$0], s_county=[$1], lochierarchy=[$5], rank_within_parent=[rank() OVER (PARTITION BY $5, when(==($5, 2), $0) ORDER BY $4 ROWS BETWEEN 2147483647 FOLLOWING AND 2147483647 PRECEDING)]) HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], indicator=[true],
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
+1 This is great for development of new features in hive and making them available to useres. This also helps users who are slow to move to new version of hadoop, they can still bug fixes and features compatible with hadoop 1 in new hive 1.x releases. It will also be easier for users to remember what hadoop version works with what version of hive. (Hive 1.x needs hadoop 1+, hive 2.x needs). On Mon, May 11, 2015 at 10:01 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: +1 for the proposal. New branch definitely helps us moving forward quickly with new features and deprecating the old stuffs (20S shims and mapreduce). Thanks Prasanth On Mon, May 11, 2015 at 7:20 PM -0700, Vikram Dixit K vikram.di...@gmail.commailto:vikram.di...@gmail.com wrote: The proposal sounds good. Supporting and maintaining hadoop-1 is hard and conflict in API changes in 2.x of hadoop keeps us from using new and better APIs as it breaks compilation. +1 Thanks Vikram. On Mon, May 11, 2015 at 7:17 PM, Sergey Shelukhin ser...@hortonworks.com wrote: That sounds like a good idea. Some features could be back ported to branch-1 if viable, but at least new stuff would not be burdened by Hadoop 1/MR code paths. Probably also a good place to enable vectorization and other perf features by default while we make alpha releases. +1 On 15/5/11, 15:38, Alan Gates alanfga...@gmail.com wrote: There is a lot of forward-looking work going on in various branches of Hive: LLAP, the HBase metastore, and the work to drop the CLI. It would be good to have a way to release this code to users so that they can experiment with it. Releasing it will also provide feedback to developers. At the same time there are discussions on whether to keep supporting Hadoop-1. The burden of supporting older, less used functionality such as Hadoop-1 is becoming ever harder as many new features are added. I propose that the best way to deal with this would be to make a branch-1. We could continue to make new feature releases off of this branch (1.3, 1.4, etc.). This branch would not drop old functionality. This provides stability and continuity for users and developers. We could then merge these new features branches (LLAP, HBase metastore, CLI drop) into the trunk, as well as turn on by default newer features such as the vectorization and ACID. We could also drop older, less used features such as support for Hadoop-1 and MapReduce. It will be a while before we are ready to make stable, production ready releases of this code. But we could start making alpha quality releases soon. We would call these releases 2.x, to stress the non-backward compatible changes such as dropping Hadoop-1. This will give users a chance to play with the new code and developers a chance to get feedback. Thoughts? -- Nothing better than when appreciated for hard work. -Mark