[jira] [Commented] (HIVE-3582) NPE in union processing followed by lateral view followed by 2 group bys
[ https://issues.apache.org/jira/browse/HIVE-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558600#comment-13558600 ] Navis commented on HIVE-3582: - currUnionOp might be removed and I've considered it. But I couldn't sure of that (Codes of genMR* are confusing for me). NPE in union processing followed by lateral view followed by 2 group bys Key: HIVE-3582 URL: https://issues.apache.org/jira/browse/HIVE-3582 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Navis Attachments: HIVE-3582.D6051.1.patch, HIVE-3582.D6051.2.patch EXPLAIN SELECT e.key, e.arr_ele, count(1) FROM ( SELECT d.key as key, d.arr_ele as arr_ele, d.value as value, count(1) as cnt FROM ( SELECT c.arr_ele as arr_ele, a.key as key, a.value as value FROM ( SELECT key, value, array(1,2,3) as arr FROM src UNION ALL SELECT key, value, array(1,2,3) as arr FROM srcpart WHERE ds = '2008-04-08' and hr='12' ) a LATERAL VIEW EXPLODE(arr) c AS arr_ele ) d group by d.key, d.arr_ele, d.value ) e group by e.key, e.arr_ele; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3582) NPE in union processing followed by lateral view followed by 2 group bys
[ https://issues.apache.org/jira/browse/HIVE-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3582: --- Resolution: Fixed Fix Version/s: 0.11.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis. If you figure out a way to get rid of curUnionOp that will simplify the dependency of these function calls, feel free to open a jira and attach patch there. NPE in union processing followed by lateral view followed by 2 group bys Key: HIVE-3582 URL: https://issues.apache.org/jira/browse/HIVE-3582 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Navis Fix For: 0.11.0 Attachments: HIVE-3582.D6051.1.patch, HIVE-3582.D6051.2.patch EXPLAIN SELECT e.key, e.arr_ele, count(1) FROM ( SELECT d.key as key, d.arr_ele as arr_ele, d.value as value, count(1) as cnt FROM ( SELECT c.arr_ele as arr_ele, a.key as key, a.value as value FROM ( SELECT key, value, array(1,2,3) as arr FROM src UNION ALL SELECT key, value, array(1,2,3) as arr FROM srcpart WHERE ds = '2008-04-08' and hr='12' ) a LATERAL VIEW EXPLODE(arr) c AS arr_ele ) d group by d.key, d.arr_ele, d.value ) e group by e.key, e.arr_ele; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3921) recursive_dir.q fails on 0.23
[ https://issues.apache.org/jira/browse/HIVE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558615#comment-13558615 ] Ashutosh Chauhan commented on HIVE-3921: [~sushanth] Did you verify if this passes on all 3 versions: 0.20.2, 1.x and 2.x ? recursive_dir.q fails on 0.23 - Key: HIVE-3921 URL: https://issues.apache.org/jira/browse/HIVE-3921 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 (2.0.2-alpha) Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: 0.23, tests Attachments: HIVE-3921.D8055.1.patch This test fails in 0.23 - It insists that hive.mapred.supports.subdirectories must be true for mapred.input.dir.recursive to be used. Currently, HiveConf sets that as false. - HIVE-3643 mentions param and says that once HIVE-3276 is in, we should switch the param, and this jira has been committed. - Testing with just setting that parameter in the .q file yeilds a mismatch with the golden file, but one that looks like it should just update the .out file: [junit] diff -a /Users/sush/dev/hive.git/build/ql/test/logs/clientpositive/recursive_dir.q.out /Users/sush/dev/hive.git/ql/src/test/results/clientpositive/recursive_dir.q.out [junit] 59d58 [junit] PREHOOK: Input: default@fact_daily [junit] 64d62 [junit] POSTHOOK: Input: default@fact_daily -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Attachment: hive.3403.16.patch user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3628) Provide a way to use counters in Hive through UDF
[ https://issues.apache.org/jira/browse/HIVE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3628: -- Attachment: HIVE-3628.D8007.5.patch navis updated the revision HIVE-3628 [jira] Provide a way to use counters in Hive through UDF. Reviewers: JIRA Missed to ammend. Sorry. REVISION DETAIL https://reviews.facebook.net/D8007 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapredContext.java ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java ql/src/test/org/apache/hadoop/hive/ql/udf/generic/DummyContextUDF.java ql/src/test/queries/clientpositive/udf_context_aware.q ql/src/test/results/clientpositive/udf_context_aware.q.out To: JIRA, navis Cc: njain Provide a way to use counters in Hive through UDF - Key: HIVE-3628 URL: https://issues.apache.org/jira/browse/HIVE-3628 Project: Hive Issue Type: Improvement Components: UDF Reporter: Viji Assignee: Navis Priority: Minor Attachments: HIVE-3628.D8007.1.patch, HIVE-3628.D8007.2.patch, HIVE-3628.D8007.3.patch, HIVE-3628.D8007.4.patch, HIVE-3628.D8007.5.patch Currently it is not possible to generate counters through UDF. We should support this. Pig currently allows this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3403: - Attachment: hive.3403.17.patch user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3464) Merging join tree may reorder joins which could be invalid
[ https://issues.apache.org/jira/browse/HIVE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558666#comment-13558666 ] Phabricator commented on HIVE-3464: --- navis has commented on the revision HIVE-3464 [jira] Merging join tree may reorder joins which could be invalid. INLINE COMMENTS ql/src/test/queries/clientpositive/mergejoins_mixed.q:19 1. (a-b-c-d) A(a.key=b.key) + B(b.key=c.key) + C(a.key=d.key) makes single join ABC(a.key=b.key=c.key=d.key) 2. ((a-b-d)-c) or (((a-b)-c)-d) A(a.key=b.key) + B(b.value=c.key) + C(a.key=d.key) before patch, hive tries merging C-B, C-A, B-A order (outer to inner), and C-A only will be merged, making two joins : AC(a.key=b.key=d.key) + B(b.value=c.key). This makes C join is executed prior to B and if join type of C is different from that of B, it's illegal. Patch consist of two parts. 1. reverted merging order (inner to outer). It makes it a little easier to check condition below. 2. check if it's possible to switch join ordering (if it has same join type) REVISION DETAIL https://reviews.facebook.net/D5409 To: JIRA, navis Cc: njain Merging join tree may reorder joins which could be invalid -- Key: HIVE-3464 URL: https://issues.apache.org/jira/browse/HIVE-3464 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Attachments: HIVE-3464.D5409.2.patch, HIVE-3464.D5409.3.patch Currently, hive merges join tree from right to left regardless of join types, which may introduce join reordering. For example, select * from a join a b on a.key=b.key join a c on b.key=c.key join a d on a.key=d.key; Hive tries to merge join tree in a-d=b-d, a-d=a-b, b-c=a-b order and a-d=a-b and b-c=a-b will be merged. Final join tree is a-(bdc). With this, ab-d join will be executed prior to ab-c. But if join type of -c and -d is different, this is not valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3833: - Attachment: hive.3833.16.path object inspectors should be initialized based on partition metadata --- Key: HIVE-3833 URL: https://issues.apache.org/jira/browse/HIVE-3833 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3833.10.patch, hive.3833.11.patch, hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, hive.3833.16.path, hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch Currently, different partitions can be picked up for the same input split based on the serdes' etc. And, we dont allow to change the schema for LazyColumnarBinarySerDe. Instead of that, different partitions should be part of the same split, only if the partition schemas exactly match. The operator tree object inspectors should be based on the partition schema. That would give greater flexibility and also help using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3921) recursive_dir.q fails on 0.23
[ https://issues.apache.org/jira/browse/HIVE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558668#comment-13558668 ] Sushanth Sowmyan commented on HIVE-3921: For this test, no, I did not, since the test explicitly already had an INCLUDE_HADOOP_MAJOR_VERSIONS(0.23) already set (I did test across versions for the others). recursive_dir.q fails on 0.23 - Key: HIVE-3921 URL: https://issues.apache.org/jira/browse/HIVE-3921 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 (2.0.2-alpha) Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Labels: 0.23, tests Attachments: HIVE-3921.D8055.1.patch This test fails in 0.23 - It insists that hive.mapred.supports.subdirectories must be true for mapred.input.dir.recursive to be used. Currently, HiveConf sets that as false. - HIVE-3643 mentions param and says that once HIVE-3276 is in, we should switch the param, and this jira has been committed. - Testing with just setting that parameter in the .q file yeilds a mismatch with the golden file, but one that looks like it should just update the .out file: [junit] diff -a /Users/sush/dev/hive.git/build/ql/test/logs/clientpositive/recursive_dir.q.out /Users/sush/dev/hive.git/ql/src/test/results/clientpositive/recursive_dir.q.out [junit] 59d58 [junit] PREHOOK: Input: default@fact_daily [junit] 64d62 [junit] POSTHOOK: Input: default@fact_daily -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1083) allow sub-directories for an external table/partition
[ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558674#comment-13558674 ] Harsh J commented on HIVE-1083: --- Given that MAPREDUCE-1501 is in MR2 today, and Hive can make use of it, should we close this out now? allow sub-directories for an external table/partition - Key: HIVE-1083 URL: https://issues.apache.org/jira/browse/HIVE-1083 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: Namit Jain Assignee: Zheng Shao Labels: inputformat Sometimes users want to define an external table/partition based on all files (recursively) inside a directory. Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2332) If all of the parameters of distinct functions are exists in group by columns, query fails in runtime
[ https://issues.apache.org/jira/browse/HIVE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558709#comment-13558709 ] Hudson commented on HIVE-2332: -- Integrated in Hive-trunk-h0.21 #1928 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1928/]) HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) (Revision 1436199) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199 Files : * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out If all of the parameters of distinct functions are exists in group by columns, query fails in runtime - Key: HIVE-2332 URL: https://issues.apache.org/jira/browse/HIVE-2332 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Reporter: Navis Assignee: Navis Priority: Blocker Fix For: 0.11.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2332.D663.1.patch, HIVE-2332.1.patch.txt, HIVE-2332.2.patch.txt select sum(key_int1), sum(distinct key_int1) from t1 group by key_int1; fails with message.. {code} FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {code} hadoop says.. {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:95) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:86) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:252) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:188) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:532) {code} I think the deficient number of key expression, compared to number of key column, is the problem, which should be equal or more. Would it be solved if add some key expression? I'll try. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3920) Change test for HIVE-2332
[ https://issues.apache.org/jira/browse/HIVE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558710#comment-13558710 ] Hudson commented on HIVE-3920: -- Integrated in Hive-trunk-h0.21 #1928 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1928/]) HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) (Revision 1436199) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199 Files : * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out Change test for HIVE-2332 - Key: HIVE-3920 URL: https://issues.apache.org/jira/browse/HIVE-3920 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Namit Jain Assignee: Ashutosh Chauhan Fix For: 0.11.0 Attachments: HIVE-3920.D8067.1.patch, HIVE-3920.patch The test groupby_distinct_samekey.q is run on t1, which is a empty table. It would be useful to add some data in the table to verify the fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1928 - Fixed
Changes for Build #1926 [hashutosh] HIVE-2332 : If all of the parameters of distinct functions are exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan) Changes for Build #1927 Changes for Build #1928 [namit] HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1928) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1928/ to view the results.
[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
[ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558826#comment-13558826 ] Namit Jain commented on HIVE-3403: -- The support for sub-queries has also been added in this. user should not specify mapjoin to perform sort-merge bucketed join --- Key: HIVE-3403 URL: https://issues.apache.org/jira/browse/HIVE-3403 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch Currently, in order to perform a sort merge bucketed join, the user needs to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the mapjoin hint. The user should not specify any hints. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3833: - Attachment: hive.3833.17.patch object inspectors should be initialized based on partition metadata --- Key: HIVE-3833 URL: https://issues.apache.org/jira/browse/HIVE-3833 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3833.10.patch, hive.3833.11.patch, hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch Currently, different partitions can be picked up for the same input split based on the serdes' etc. And, we dont allow to change the schema for LazyColumnarBinarySerDe. Instead of that, different partitions should be part of the same split, only if the partition schemas exactly match. The operator tree object inspectors should be based on the partition schema. That would give greater flexibility and also help using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3628) Provide a way to use counters in Hive through UDF
[ https://issues.apache.org/jira/browse/HIVE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558840#comment-13558840 ] Namit Jain commented on HIVE-3628: -- [~navis], is it ready for review ? Provide a way to use counters in Hive through UDF - Key: HIVE-3628 URL: https://issues.apache.org/jira/browse/HIVE-3628 Project: Hive Issue Type: Improvement Components: UDF Reporter: Viji Assignee: Navis Priority: Minor Attachments: HIVE-3628.D8007.1.patch, HIVE-3628.D8007.2.patch, HIVE-3628.D8007.3.patch, HIVE-3628.D8007.4.patch, HIVE-3628.D8007.5.patch Currently it is not possible to generate counters through UDF. We should support this. Pig currently allows this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3877) Implement equi-depth histograms as a UDAF
[ https://issues.apache.org/jira/browse/HIVE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558950#comment-13558950 ] Rahul Jain commented on HIVE-3877: -- Hi Shreepadma, We're very interested in such a functionality for analytics use cases on standard hive... Has this been implemented at all or still at concept stage ? If you've already made progress here, I'd be happy to collaborate to bring this to completion. Implement equi-depth histograms as a UDAF - Key: HIVE-3877 URL: https://issues.apache.org/jira/browse/HIVE-3877 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Implement a space and time efficient algorithm to bin numeric column data such that all bins approximately contain the same number of elements. Implement such an algorithm as a generic UDAF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2332) If all of the parameters of distinct functions are exists in group by columns, query fails in runtime
[ https://issues.apache.org/jira/browse/HIVE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558989#comment-13558989 ] Hudson commented on HIVE-2332: -- Integrated in Hive-trunk-hadoop2 #79 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/79/]) HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) (Revision 1436199) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199 Files : * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out If all of the parameters of distinct functions are exists in group by columns, query fails in runtime - Key: HIVE-2332 URL: https://issues.apache.org/jira/browse/HIVE-2332 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Reporter: Navis Assignee: Navis Priority: Blocker Fix For: 0.11.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2332.D663.1.patch, HIVE-2332.1.patch.txt, HIVE-2332.2.patch.txt select sum(key_int1), sum(distinct key_int1) from t1 group by key_int1; fails with message.. {code} FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {code} hadoop says.. {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:95) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:86) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:252) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:188) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:532) {code} I think the deficient number of key expression, compared to number of key column, is the problem, which should be equal or more. Would it be solved if add some key expression? I'll try. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3920) Change test for HIVE-2332
[ https://issues.apache.org/jira/browse/HIVE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558990#comment-13558990 ] Hudson commented on HIVE-3920: -- Integrated in Hive-trunk-hadoop2 #79 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/79/]) HIVE-3920 Change test for HIVE-2332 (Ashutosh Chauhan and Navis via namit) (Revision 1436199) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199 Files : * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out Change test for HIVE-2332 - Key: HIVE-3920 URL: https://issues.apache.org/jira/browse/HIVE-3920 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Namit Jain Assignee: Ashutosh Chauhan Fix For: 0.11.0 Attachments: HIVE-3920.D8067.1.patch, HIVE-3920.patch The test groupby_distinct_samekey.q is run on t1, which is a empty table. It would be useful to add some data in the table to verify the fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] HCatalog becoming a subproject of Hive
Hi Alan, Overall this looks good to me. I have a couple small suggestions: * Replace occurrences of Hive's subversion repository with Hive's source code repository. * In the Actions table the sentence This also covers the creation of new sub-projects within the project should be changed to This also covers the creation of new sub-projects and sub-modules within the project. Thanks. Carl On Fri, Jan 18, 2013 at 4:42 PM, Alan Gates ga...@hortonworks.com wrote: I've created a wiki page for my proposed changes at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers Text to be removed is struck through. Text to be added is in italics. Any recommended changes before we vote? Alan. On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote: Sounds like a good plan to me. Since Ashutosh is a member of both the Hive and HCatalog PMCs it probably makes more sense for him to call the vote, but I'm willing to do it too. On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com wrote: If you think that's the best path forward that's fine. I can't call a vote I don't think, since I'm not part of the Hive PMC. But I'm happy to draft a resolution for you and then let you call the vote. Should I do that? Alan. On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote: Hi Alan, I agree that submitting this for a vote is the best option. If anyone has additional proposed modifications please make them. Otherwise I propose that the Hive PMC vote on this proposal. In order for the Hive PMC to be able to vote on these changes they need to be expressed in terms of one or more of the actions listed at the end of the Hive project bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws So I think we first need to amend to the bylaws in order to define the rights and privileges of a submodule committer, and then separately vote the HCatalog committers in as Hive submodule committers. Does this make sense? Thanks. Carl
[jira] [Created] (HIVE-3923) join_filters_overlap.q fails on 0.23
Sushanth Sowmyan created HIVE-3923: -- Summary: join_filters_overlap.q fails on 0.23 Key: HIVE-3923 URL: https://issues.apache.org/jira/browse/HIVE-3923 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559097#comment-13559097 ] Alan Gates commented on HIVE-896: - Harish, Thanks for you replies. I want to think on your explanation in 2 above some more, but at least I think I understand your rationale now. One other question. I tried playing around with this but kept getting an error. I'm not sure what I'm doing wrong. I have a table that I created with the following statement: {code} create table studenttab10k (name string, age int, gpa float); {code} When I run {code} select avg(gpa) over (cluster by age) from studenttab10k; {code} I get {code} FAILED: SemanticException 1:43 No partition specification associated with start of PTF chain . Error encountered near token 'age' {code} I looked through the syntax file and I think I'm doing the right thing, but obviously I'm not. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: HIVE-896.1.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3923) join_filters_overlap.q fails on 0.23
[ https://issues.apache.org/jira/browse/HIVE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3923: -- Attachment: HIVE-3923.D8079.1.patch khorgath requested code review of HIVE-3923 [jira] join_filters_overlap.q fails on 0.23. Reviewers: JIRA Adding order-by to tests to fix 0.23 test breakage As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. TEST PLAN Patch attached is a test fix REVISION DETAIL https://reviews.facebook.net/D8079 AFFECTED FILES ql/src/test/queries/clientpositive/join_filters_overlap.q ql/src/test/results/clientpositive/join_filters_overlap.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/19485/ To: JIRA, khorgath join_filters_overlap.q fails on 0.23 Key: HIVE-3923 URL: https://issues.apache.org/jira/browse/HIVE-3923 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Attachments: HIVE-3923.D8079.1.patch As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3923) join_filters_overlap.q fails on 0.23
[ https://issues.apache.org/jira/browse/HIVE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-3923: --- Status: Patch Available (was: Open) Phabricator link : https://reviews.facebook.net/D8079 join_filters_overlap.q fails on 0.23 Key: HIVE-3923 URL: https://issues.apache.org/jira/browse/HIVE-3923 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Attachments: HIVE-3923.D8079.1.patch As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3924) join_nullsafe.q fails on 0.23
Sushanth Sowmyan created HIVE-3924: -- Summary: join_nullsafe.q fails on 0.23 Key: HIVE-3924 URL: https://issues.apache.org/jira/browse/HIVE-3924 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [DISCUSS] HCatalog becoming a subproject of Hive
Changes made. Alan. On Jan 21, 2013, at 12:39 PM, Carl Steinbach wrote: Hi Alan, Overall this looks good to me. I have a couple small suggestions: * Replace occurrences of Hive's subversion repository with Hive's source code repository. * In the Actions table the sentence This also covers the creation of new sub-projects within the project should be changed to This also covers the creation of new sub-projects and sub-modules within the project. Thanks. Carl On Fri, Jan 18, 2013 at 4:42 PM, Alan Gates ga...@hortonworks.com wrote: I've created a wiki page for my proposed changes at https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers Text to be removed is struck through. Text to be added is in italics. Any recommended changes before we vote? Alan. On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote: Sounds like a good plan to me. Since Ashutosh is a member of both the Hive and HCatalog PMCs it probably makes more sense for him to call the vote, but I'm willing to do it too. On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com wrote: If you think that's the best path forward that's fine. I can't call a vote I don't think, since I'm not part of the Hive PMC. But I'm happy to draft a resolution for you and then let you call the vote. Should I do that? Alan. On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote: Hi Alan, I agree that submitting this for a vote is the best option. If anyone has additional proposed modifications please make them. Otherwise I propose that the Hive PMC vote on this proposal. In order for the Hive PMC to be able to vote on these changes they need to be expressed in terms of one or more of the actions listed at the end of the Hive project bylaws: https://cwiki.apache.org/confluence/display/Hive/Bylaws So I think we first need to amend to the bylaws in order to define the rights and privileges of a submodule committer, and then separately vote the HCatalog committers in as Hive submodule committers. Does this make sense? Thanks. Carl
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559154#comment-13559154 ] Harish Butani commented on HIVE-896: Alan, Thanks for spending the time. Yes your e.g. is going to fail. There was a bug in the patch we posted. This was fixed in commit 0eff864d765c91e0bece497e0f007c6cd2cec72f in our repo on Jan 9th. I can send you a patch privately or post the updated patch here. Sorry about this. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: HIVE-896.1.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3924) join_nullsafe.q fails on 0.23
[ https://issues.apache.org/jira/browse/HIVE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-3924: --- Status: Patch Available (was: Open) Phabricator link : https://reviews.facebook.net/D8085 join_nullsafe.q fails on 0.23 - Key: HIVE-3924 URL: https://issues.apache.org/jira/browse/HIVE-3924 Project: Hive Issue Type: Bug Components: Tests Environment: Hadoop 0.23 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor As with some of the other broken tests on 0.23, this is broken because the order of results generated by the query on 0.23 is different from the order in the golden output file. However, there appears to be nothing wrong with the query itself. This can be fixed by adding an order-by clause and regenerating the golden file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3326: -- Attachment: HIVE-3326.D8091.1.patch navis requested code review of HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong. Reviewers: JIRA DPAL-1968 plan for multiple mapjoin followed by a normal join is wrong example queries: create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; in explain mode, I got this: hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias - Map Local Tables: b Not Important Stage: Stage-2 Map Reduce Alias - Map Operator Tree: a Not Important Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: c Not Important Stage: Stage-3 Map Reduce Alias - Map Operator Tree: file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Not Important Local Work: Map Reduce Local Work Stage: Stage-1 Map Reduce Alias - Map Operator Tree: d TableScan file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Select Operator Reduce Operator Tree: Not Important You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002'). To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431): GenMapRedUtils.java if (oldMapJoin == null) { if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp) || local || (oldTask != null) (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); } } else { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); } my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8091 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java ql/src/test/queries/clientpositive/mapjoin_mapjoin_join.q ql/src/test/results/clientpositive/mapjoin_mapjoin_join.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/19497/ To: JIRA, navis plan for multiple mapjoin followed by a normal join is wrong Key: HIVE-3326 URL: https://issues.apache.org/jira/browse/HIVE-3326 Project: Hive Issue Type: Bug Components: SQL Environment: OS X 10.8; java 1.6.0_33 Reporter: Zhang Xinyu Assignee: Navis Attachments: HIVE-3326.D8091.1.patch, patch.diff example queries: {code} create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; {code} in explain mode, I got this: {code} hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias - Map Local Tables: b Not Important Stage: Stage-2 Map Reduce Alias - Map Operator Tree: a Not Important Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: c Not Important Stage: Stage-3 Map Reduce Alias - Map Operator
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559257#comment-13559257 ] Alan Gates commented on HIVE-896: - I'd definitely like to get a new version of the patch. I'm happy to pull from github. I looked at the repo referenced above ( https://github.com/hbutani/SQLWindowing ) but it didn't have any recent updates. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: HIVE-896.1.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559263#comment-13559263 ] Harish Butani commented on HIVE-896: Its https://github.com/hbutani/hive (ptf branch) The SQLWindowing repo has the work we did on top of hive. Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Components: OLAP, UDF Reporter: Amr Awadallah Priority: Minor Attachments: HIVE-896.1.patch.txt Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559378#comment-13559378 ] Phabricator commented on HIVE-3326: --- njain has commented on the revision HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong. Navis, I am not sure, we should support this. https://issues.apache.org/jira/browse/HIVE-3784 is the right way to go. We are adding way more complexity than is needed to solve this problem. Let me refresh HIVE-3784 and try to address Ashutosh's concerns. REVISION DETAIL https://reviews.facebook.net/D8091 To: JIRA, navis Cc: njain plan for multiple mapjoin followed by a normal join is wrong Key: HIVE-3326 URL: https://issues.apache.org/jira/browse/HIVE-3326 Project: Hive Issue Type: Bug Components: SQL Environment: OS X 10.8; java 1.6.0_33 Reporter: Zhang Xinyu Assignee: Navis Attachments: HIVE-3326.D8091.1.patch, patch.diff example queries: {code} create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; {code} in explain mode, I got this: {code} hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias - Map Local Tables: b Not Important Stage: Stage-2 Map Reduce Alias - Map Operator Tree: a Not Important Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: c Not Important Stage: Stage-3 Map Reduce Alias - Map Operator Tree: file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Not Important Local Work: Map Reduce Local Work Stage: Stage-1 Map Reduce Alias - Map Operator Tree: d TableScan file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Select Operator Reduce Operator Tree: Not Important {code} You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002'). To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431): {code:title=GenMapRedUtils.java} if (oldMapJoin == null) { if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp) || local || (oldTask != null) (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); } } else { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); } {code} my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong
[ https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3326: - Status: Open (was: Patch Available) comments on phabricator plan for multiple mapjoin followed by a normal join is wrong Key: HIVE-3326 URL: https://issues.apache.org/jira/browse/HIVE-3326 Project: Hive Issue Type: Bug Components: SQL Environment: OS X 10.8; java 1.6.0_33 Reporter: Zhang Xinyu Assignee: Navis Attachments: HIVE-3326.D8091.1.patch, patch.diff example queries: {code} create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; {code} in explain mode, I got this: {code} hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias - Map Local Tables: b Not Important Stage: Stage-2 Map Reduce Alias - Map Operator Tree: a Not Important Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: c Not Important Stage: Stage-3 Map Reduce Alias - Map Operator Tree: file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Not Important Local Work: Map Reduce Local Work Stage: Stage-1 Map Reduce Alias - Map Operator Tree: d TableScan file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Select Operator Reduce Operator Tree: Not Important {code} You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002'). To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431): {code:title=GenMapRedUtils.java} if (oldMapJoin == null) { if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp) || local || (oldTask != null) (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); } } else { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); } {code} my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3784) de-emphasize mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3784: - Attachment: hive.3784.6.patch de-emphasize mapjoin hint - Key: HIVE-3784 URL: https://issues.apache.org/jira/browse/HIVE-3784 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, hive.3784.4.patch, hive.3784.5.patch, hive.3784.6.patch hive.auto.convert.join has been around for a long time, and is pretty stable. When mapjoin hint was created, the above parameter did not exist. The only reason for the user to specify a mapjoin currently is if they want it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin. Eventually, that should also go away, but that may take some time to stabilize. There are many rules in SemanticAnalyzer to handle the following trees: ReduceSink - MapJoin Union - MapJoin MapJoin- MapJoin This should not be supported anymore. In any of the above scenarios, the user can get the mapjoin behavior by setting hive.auto.convert.join to true and not specifying the hint. This will simplify the code a lot. What does everyone think ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly
[ https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559423#comment-13559423 ] Navis commented on HIVE-2839: - I think all of the patches dealing MAPJOIN hint should wait till HIVE-3784 is committed. Filters on outer join with mapjoin hint is not applied correctly Key: HIVE-2839 URL: https://issues.apache.org/jira/browse/HIVE-2839 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions. {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND true limit 10; FAILED: Hive Internal Error: java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} and {code} SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key AND b.key * 10 '1000' limit 10; java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1321) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:495) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 8 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559426#comment-13559426 ] Namit Jain commented on HIVE-3784: -- I was thinking of adding a size parameter. If n-1 tables are below that size (for a n-way join), the joinTask should be converted to a mapJoin task (map-only) instead of a conditional task. We would need a further optimization step to merge 2 map-only tasks to a single map-only task. [~navis], what do you think ? Can you think of a better idea ? de-emphasize mapjoin hint - Key: HIVE-3784 URL: https://issues.apache.org/jira/browse/HIVE-3784 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, hive.3784.4.patch, hive.3784.5.patch, hive.3784.6.patch hive.auto.convert.join has been around for a long time, and is pretty stable. When mapjoin hint was created, the above parameter did not exist. The only reason for the user to specify a mapjoin currently is if they want it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin. Eventually, that should also go away, but that may take some time to stabilize. There are many rules in SemanticAnalyzer to handle the following trees: ReduceSink - MapJoin Union - MapJoin MapJoin- MapJoin This should not be supported anymore. In any of the above scenarios, the user can get the mapjoin behavior by setting hive.auto.convert.join to true and not specifying the hint. This will simplify the code a lot. What does everyone think ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3925) dependencies of fetch task are not shown by explain
Namit Jain created HIVE-3925: Summary: dependencies of fetch task are not shown by explain Key: HIVE-3925 URL: https://issues.apache.org/jira/browse/HIVE-3925 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain A simple query like: hive explain select * from src order by key; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME src))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL key) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage Stage: Stage-0 Fetch Operator limit: -1 Stage-0 is not a root stage and depends on stage-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira