[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637860#comment-13637860 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/test/queries/clientpositive/reduce_deduplicate.q:26 can you add more comments here ?? who sets trustScript ? As a user, how can i do that What is supposed to happen. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:124 Can you add more comments here ? Is minReducer only set for order By ? REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637847#comment-13637847 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q:5 can you add some comments in this test ??? RS + GBY + RS is optimized to remove the last RS etc. before the case REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636073#comment-13636073 ] Phabricator commented on HIVE-2340: --- navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122 When ScriptOperator exists between RSs, it might possible to dedup only if the script does not change schema, order of rows and values of the RS related columns. It seemed added for that case by He Yongqiang, initial developer of this optimizer. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103 Added comments ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359 ok. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181 done. REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635314#comment-13635314 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181 nit: spelling Abstract ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103 The order in which the rules are specified matter, since in case of exact match for costs, the last rule is invoked. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122 What are the semantics of trustScript ? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359 can you add more comments ? REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626937#comment-13626937 ] Hudson commented on HIVE-2340: -- Integrated in Hive-trunk-h0.21 #2053 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2053/]) HIVE-2340 : optimize orderby followed by a groupby (Navis via Ashutosh Chauhan) (Revision 1465721) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465721 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/auto_join26.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q * /hive/trunk/ql/src/test/results/clientpositive/groupby2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_cube1.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_rollup1.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out * /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out * /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out * /hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626434#comment-13626434 ] Hudson commented on HIVE-2340: -- Integrated in Hive-trunk-hadoop2 #147 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/147/]) HIVE-2340 : optimize orderby followed by a groupby (Navis via Ashutosh Chauhan) (Revision 1465721) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465721 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/auto_join26.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate.q * /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q * /hive/trunk/ql/src/test/results/clientpositive/groupby2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_cube1.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_rollup1.q.out * /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out * /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out * /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out * /hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Fix For: 0.11.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, > HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, > HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, > HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624938#comment-13624938 ] Ashutosh Chauhan commented on HIVE-2340: Also, HIVE-4302 is resolved now, so workaround put in by Gunther should now longer be required. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624794#comment-13624794 ] Navis commented on HIVE-2340: - When auto.convert.join is true(default for now), RSDedup cannot try to merge following RS, effectively disabling the code part. I think physical planning should be postponed to completion time of each task. I'll update test results. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624314#comment-13624314 ] Ashutosh Chauhan commented on HIVE-2340: Following tests failed on latest patch: * TestCliDriver.auto_join26.q * TestCliDriver.index_bitmap3.q * TestCliDriver.index_bitmap_auto.q * TestCliDriver.infer_bucket_sort.q * TestCliDriver.ppd_gby_join.q * TestCliDriver.semijoin.q * TestCliDriver.union24.q * TestCliDriver.cluster.q * TestMinimrCliDriver.infer_bucket_sort_reducers_power_two.q * TestParse.join1 * TestParse.join2 * TestParse.join3 > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624131#comment-13624131 ] Harish Butani commented on HIVE-2340: - The problem with PTF is in plan generation. In SemanticAnalyzer::genReduceSinkPlanForWindowing line 10778, we are setting the RowSchema of the ReduceSinkOp that precedes a PTF to be the same one as the Op that precedes it. In the e.g. that was failing the preceding Op was another PTFOp. The fix is to set both the RowSchema and RowResolver by not pointing to the 'input' Op's structures(same issue in genPTFPlanForComponentQuery). Will fix in the separate Jira. Will add back 'schema.getSignature().remove(colInfo);' to ColumnPrunerProcFactory. Sorry about this. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623262#comment-13623262 ] Ashutosh Chauhan commented on HIVE-2340: Cool. Running test on latest patch. Will commit if tests pass. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623113#comment-13623113 ] Navis commented on HIVE-2340: - Got it. I've updated patch (added small comment on this). Thanks. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622946#comment-13622946 ] Ashutosh Chauhan commented on HIVE-2340: In PTF handling we do keep a copy of schema in Desc object to be used later at runtime. This could be improved. Me and Harish are exploring that. But I think Gunther's fix gets us around that, so for this patch I would recommend to go ahead with Gunther's fix, since this is hanging on for more than 6 months and no point in delaying it further. So, Navis if you are on board I will like to get .14 patch in trunk. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622927#comment-13622927 ] Navis commented on HIVE-2340: - CP for RS is correctly modifies schema of RS, and it makes CP tests for PTF fail. But as suggested by Hagleitner, after making schema of RS intact even after CP, those tests are resulted to success. I believe there is some assumption on schema in CP for PTF. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622473#comment-13622473 ] Ashutosh Chauhan commented on HIVE-2340: [~navis] I am not sure which test case you have in mind, but I ran following test and it worked for me. {code} hive> create table t1 (a1 int, b1 string); hive> create table t2 (a1 int, b1 string); hive> from (select sum(i) over (), s from over10k) tt insert overwrite table t1 select * insert overwrite table t2 select * ; hive> select * from t1 limit 3; hive> select * from t2 limit 3; {code} I got expected results. If you run explain on that query you will see it has PTFOperator. You can check windowing_udaf.q to see the schema and data for over10k table. Let me know which query have in mind, for which CP for PTF may be broken. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621953#comment-13621953 ] Navis commented on HIVE-2340: - I think CP for PTF is broken. Especially for single sourced multi-insert query. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621819#comment-13621819 ] Navis commented on HIVE-2340: - [~hagleitn] Thanks. I'll check that. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, > HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621677#comment-13621677 ] Ashutosh Chauhan commented on HIVE-2340: Would love to see this make it in 0.11. [~navis] / [~hagleitn] Let me know if its ready to go in. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621675#comment-13621675 ] Gunther Hagleitner commented on HIVE-2340: -- [~navis]: The patch doesn't apply cleanly anymore: Culprit is HIVE-4186, but the fix is no longer actually needed with your changes in ReduceSinkDeDuplication. After rebasing it, I found it to fail "ptf.q" - a new testcase for windowing. The problem seems to be that something in the ptf code is hanging on to the schema/signature of the reducesink. Before your change we cloned it, now you're modifying it. When I added code to clone the signature it passed all windowing tests. Can you take a look? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600482#comment-13600482 ] Phabricator commented on HIVE-2340: --- hagleitn has accepted the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600413#comment-13600413 ] Gunther Hagleitner commented on HIVE-2340: -- Ran unit tests as well. All passed. Still going through the review. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592046#comment-13592046 ] Navis commented on HIVE-2340: - [~hagleitn]: It seemed affected by HIVE-948, which is recently committed. Thank for your help. I'll check that. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591997#comment-13591997 ] Gunther Hagleitner commented on HIVE-2340: -- HIVE-2340.14.patch that is. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591995#comment-13591995 ] Gunther Hagleitner commented on HIVE-2340: -- [~navis]: I've tried the latest patch (.13) and it doesn't apply cleanly anymore. I rebased it, but after doing so I got an NPE from auto_join19.q. Something must have changed in the meantime. I was however able to get the previous version of the patch to apply and work with just a minor change. Tests pass for me on that one. I'll upload, in case you want to take a look. Feel free to ignore. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, > HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, > HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576343#comment-13576343 ] Navis commented on HIVE-2340: - [~hagleitn]: Yes, it was HIVE-2339 and I'm not sure that it's fixed. And.. recently added GBY related operations(multi-GBY, cube, etc) are not making mappings for key from start. So that should be fixed also. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576337#comment-13576337 ] Gunther Hagleitner commented on HIVE-2340: -- [~navis]: Are you suggesting to just revert the changes in ColumnPrunerProcFactory? I've tried that, but when I do that reduce_deduplicate_extended.q is failing. I've started to debug into that problem - it seems backtracking fails at that point because it can't find the source of "KEY._col0" etc. Seems there should be entries for that in the colExprMap of the RS, but that's not there. Are you getting different results? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576309#comment-13576309 ] Navis commented on HIVE-2340: - [~hagleitn]: Thanks for your help. Tests are failed by modifications of ColumnPrunerProcFactory, which was based on very old version of it. After removing that all tests have passed except infer_bucket_sort.q, which is affected by this optimization. Disabling the optimization made it work, too. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574224#comment-13574224 ] Gunther Hagleitner commented on HIVE-2340: -- [~navis]: Disabling the join merge when auto.convert is on should work. I ran unit tests for patch11 and for me these TestCliDriver tests are failing: auto_join13.q,auto_join26.q,infer_bucket_sort.q,join13.q,partition_wise_fileformat14.q > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, > HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, > testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572273#comment-13572273 ] Phabricator commented on HIVE-2340: --- navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138 ok. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 I wish I could but CommonJoinResolver is a physical optimizer, which means there is no RS-RS operator tree which could me merged on that stage. I'm thinking of disabling this optimization if user configured hive.auto.convert.join=true or hive.auto.convert.join.noconditionaltask=true. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251 I'll add more explanations on hive-default.xml.template ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 For rules with same cost, DefaultRuleDispatcher selects last one, something like this, {code} if ((cost >= 0) && (cost <= minCost)) { minCost = cost; rule = r; } {code} So R2 will be selected. conf/hive-default.xml.template:1034 It's commented on https://issues.apache.org/jira/browse/HIVE-2340?focusedCommentId=13568361&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568361 This optimization merges two RSs by moving key/parts/num-reducers of child-RS to parent-RS, which means if num-reducer of child-RS is fixed (order by or forced bucketing) and small, it can resulted to very slow, single MR. For preventing this, the configuration makes min threshold for applying this optimization. It's not good enough, but I cannot think of better idea. REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572126#comment-13572126 ] Phabricator commented on HIVE-2340: --- hagleitn has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138 HashSet? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251 I think the "number of reducers" story deserves more comments (similar to what you've explained on the jira) ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 I think if you just run this optimization *after* CommonJoinResolver everything should be fine. It will either already have converted joins to mapjoins and this optimization won't apply or you still have a regular join and you can merge it without worrying about missing out on a mapjoin conversion. You could still have the "sorted" flag to express intent, but there isn't any optimization that will pull the rug out under you at the moment. Am I missing something? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571177#comment-13571177 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". Do you think it might be a good idea to get HIVE-3972 first ? INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 Isn't it true that R1 and R2 will have the same cost for RS -> GBY --> anything --> RS ? If yes, how do you know which rule will be fired ? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571168#comment-13571168 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". A general question ??? How does it work with hive.optimize.reducededuplication ? INLINE COMMENTS conf/hive-default.xml.template:1034 Sorry for joining late: Can you explain this more clearly ? REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn, njain > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571130#comment-13571130 ] Gunther Hagleitner commented on HIVE-2340: -- Ah, that's good info. Makes sense now. The patch is useful as is, but is the only way to actually optimize the groupby/orderby case to do the ratio thing as a conditional task? And if so would that be this or a follow up jira? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570813#comment-13570813 ] Navis commented on HIVE-2340: - @Gunther Hagleitner: I also considered ratio thing, but number of reducers is calculated based on input size just before submitted to hadoop and cannot be known in optimizer layer. Except those special cases with order by and bucketing, number of reducers for both RS is -1. So generally speaking, it's safe. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570645#comment-13570645 ] Gunther Hagleitner commented on HIVE-2340: -- [~navis]: I think in general the logic should be to copy numReducers from parent to child not the other way around. If hive makes a decent estimate of reducers for the parent, that's probably the number you want to carry into the combined reduce stage, because that means each reducer is doing the desired amount of work. Buckets and order by are the only special cases I can think of, where the number needs to be fixed. For those special cases without knowing the cardinalities of join/group by/tables, it's indeed difficult to guess if the optimization should be on or off. However, what do you think of using a max ratio of parent reducers/child reducers instead of a fixed minimum number of reducers for the child? With a default of 4 maybe. I.e.: If there are less than 4 times as many reducers in the parent than in the child collapse (assuming another job will be more expensive than the lower number of reducers), else leave it alone. The optimization is only good if the input sizes of the child and parent reducers are similar and expressing this as a ratio of number of reducers is probably the closest we can get right now. This would enable the optimization for a larger body of queries (small tables, single input split, empty group by expr, etc). > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, > HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568466#comment-13568466 ] Phabricator commented on HIVE-2340: --- navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". INLINE COMMENTS common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Yes, it is. There should be a better configuration if possible. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 Yes, should be 'fixed' as sorted state. I'll rename it. I think there should be a configuration preventing MAPJOINable JOIN from being merged with next RS. But seemed hard to know if it's mapjoinable. Any idea? ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 Yes, typos. I'll remove this from patch. ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 ok. REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568361#comment-13568361 ] Navis commented on HIVE-2340: - @Gunther Hagleitner, This optimization copies configuration of childRS to parentRS, which can lead to performance problems if childRS is run on single or small number of RSs (bucketing, etc), especially when parentRS is for JOIN or GBY. Thresholding it by "min.reducer" conf seemed not good enough. Do you have some better idea? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566902#comment-13566902 ] Gunther Hagleitner commented on HIVE-2340: -- Partial review on phabricator. Biggest question is around "hive.optimize.reducededuplication.min.reducer". That basically disables the "orderby followed by groupby" optimization which was the original motivation for the jira. Navis, can you explain this some more? Might be another ticket, but would it be possible to optimize group by/sort by as well with this? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566898#comment-13566898 ] Phabricator commented on HIVE-2340: --- hagleitn has commented on the revision "HIVE-2340 [jira] optimize orderby followed by a groupby". Partial review INLINE COMMENTS common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why this is needed or why this defaults to 4. From comment below it seems this is just to avoid the single reducer order-by case for performance reasons, is that correct? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787 Is this required or extra protection? Comment at the top of the file says mapjoin optimization happens before this (and probably should for performance reasons). Also, if I understand it correctly "joinAndSort" might be a better name than "fixed". You're basically saying that if an optimization wants to change the join after this they need to make sure the ordering of the keys is preserved, right? ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 seems orthogonal to this patch. ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot of tests, for min.reducer=1. No order by case for instance. Maybe the reduce_deduplicate_extended.q should run with both default and min.reducer=1. REVISION DETAIL https://reviews.facebook.net/D1209 To: JIRA, navis Cc: hagleitn > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch, testclidriver.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566810#comment-13566810 ] Gunther Hagleitner commented on HIVE-2340: -- FYI: Ran all unit tests on patch .9. Failing tests are: groupby_distinct_samekey.q,join31.q,reduce_deduplicate_extended.q (TestCliDriver). Failures look like outdated golden files (explain output changed). Uploaded testclidriver.txt for reference. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, > HIVE-2340.D1209.9.patch > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560424#comment-13560424 ] Navis commented on HIVE-2340: - These are not removed but just merged with other rules. For example RS-*-RS/GBY is currently handled by RS-*-RS which is now diverged by type of child of child RS. It's more simple and seemed safer because this optimizer removes or replaces child RS and following GBY if possible. But this means child RS can be removed/replaced before accessed by graph walker. I've ran about twenty tests before submitting. I wish I had some time to run all pending tests but a little busy recently, sorry. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559914#comment-13559914 ] Ashutosh Chauhan commented on HIVE-2340: Thanks, Navis for updating the patch. I haven't looked the patch in detail, but some comments: * In the latest patch, you have removed rule {{RS%.*RS%GBY%}} and {{RS%GBY%.*RS%GBY%}} and have modified rule {{JOIN%.*RS%GBY%}} to {{JOIN%.*%RS%}} Can you shed some light on thinking behind picking these rules? Were those rules not stable or you think they are not useful? * auto_join_26.q.out is incorrect, its generating wrong results. Looks like aggregation is not happening correctly. * I haven't ran full suite, but queries groupby_grouping_sets5.q and smb_mapjoin_14.q are failing after applying this patch. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, > HIVE-2340.D1209.6.patch > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556660#comment-13556660 ] Yin Huai commented on HIVE-2340: Yes, I agree. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556651#comment-13556651 ] Ashutosh Chauhan commented on HIVE-2340: Thanks Yin for explaining. Your ASCII art helped in understanding the differences : ) I better understand the reason for the fake new operator now. I think in cases you have pointed out when there is such kind of trees, this reduce deduplication approach won't help, since it looks at linear chain of RS and eliminates the one where it could. You would need a fake operator in such case because you don't want to modify the GBY or Join operators which make sense. I see the merits of Ysmart better now. Though, on the other hand patch on this jira is still useful and complementary to ysmart. Since, it will collapse linear RS, instead of adding fake ones. In addition to collapsing of those operators, it will also make the life of ysmart easier because than ysmart will be dealing with simpler plans with reduce sinks already deduplicated. We need to make sure reducededup rule fires before ysmart for both optimizations to play nicely. So, I think we should make progress on both these patches. [~navis] Will you like to refresh this patch? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556620#comment-13556620 ] Yin Huai commented on HIVE-2340: The current implementation of the patch of YSmart covers scenarios when a join or aggregation operator share the same partition keys with its all parents (join or aggregation operators). For example, a single MR job will be generated if all operators in the following plan share the same partition keys. {code} JOIN \ JOIN /\ GBY \ JOIN / GBY--- -/ {code} Also, it requires that the bottom join or aggregation operators which will be processed in the same MR job take input tables instead of intermediate tables. In future, it should be extended to cover scenarios that involve intermediate tables, that correlated operators share common partition keys (not exactly the same keys), and that a join or aggregation operator share common keys with some of its parents. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608 ] Yin Huai commented on HIVE-2340: Let me explain the reason that I introduced the fake RS operator instead of just removing the original RS. When I was developing the patch for 2206, I found that the aggregation operator (GBY) and the join operator (JOIN) use different logic on processing rows forwarded to it. Although they both buffer rows, a GBY determines if it need to forward results to its children in processOp. While, a JOIN replies on endGroup to know when it should forward results. When we have plans like GBY-GBY or JOIN-GBY, that difference on processing logic is fine. However, when we have plan like {code} GBYGBY \ \ JOINor JOIN / / GBYJOIN--- {code} We need operators between the child JOIN and parent GBYs and JOINs to make sure JOIN process rows in a correct way. This is also the reason that in CorrelationLocalSimulativeReduceSinkOperator, it determines when to start the group of its children in processOp and leave a empty startGroup and endGroup. Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs and JOINs which will be merged into the same Reduce phase. Since the input of the first operator in the Reduce side is in the format of [key, value, tag], so I use those fake RSs to generate rows in the same format. But this part of work was implemented about almost 2 years ago. Definitely let me know if anything has been changed and this fake RS is no longer needed. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556513#comment-13556513 ] Ashutosh Chauhan commented on HIVE-2340: Yeah, correct JOIN-GBY and GBY-GBY are taken care of in ysmart also. Its the group-by followed by order-by case which is also of interest to me, which this already covers. Besides the scenario covered by these two patches, I am also comparing the approaches taken in these two. I have just briefly looked at this patch, but fundamental difference which I can make out in this approach Vs ysmart approach is that here RS is deduplicated that is completely removed from operator pipeline, wherever it could be (i.e. when keys of subsequent RS is superset of the earlier one) thus fusing multiple MR jobs. Ysmart on the other hand instead replaces the second RS with a new operator its introducing (LocalSimulatedReduceSink?) which fakes the RS but doesn't let the plan split in 2 MR jobs and thus generating one MR job. I haven't thought through completely on this, but on initial pass it seems like approach of this patch is better than ysmart because: * Here you don't need a new operator. * Here you are simplifying the plan by eliminating the operators as oppose to ysmart which is replacing the operator thereby increasing the complexity of plan (by having a new type of operator) * In that new operator ysmart currently serializes and deserializes the data through that operator, thereby unnecessarily introducing performance penalty. Granted this could be improved, but problem doesn't exist in patch proposed on this jira to begin with. Though there are certainly other scenarios which ysmart can cover (Yin, can you list those) which this patch is not covering, but for the scenarios that are common this approach seems to be better. There might be other differences in the approach, please feel free to raise those. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555803#comment-13555803 ] Yin Huai commented on HIVE-2340: [~ashutoshc] Can you let me know cases of JOIN-GROUPBY and GBY-GBY which 2206 does not optimize? Also, does RS refer to those queries with "cluster by", "sort by" or "distributed by"? Thanks. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1373#comment-1373 ] Ashutosh Chauhan commented on HIVE-2340: [~navis] I played with HIVE-2206 over last couple of days and I think your work is complimentary and useful because there are several cases which this patch is already optimizing which HIVE-2206 doesn't currently. For e.g., order-by followed by groupby, RS followed by GBY, JOIN-GROUPBY, RS-RS, GBY-GBY. This covers a lot of ground, so I suggest we move forward with this. > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541214#comment-13541214 ] Navis commented on HIVE-2340: - @Ashutosh I've abandoned working on this cause HIVE-2206 is superset of it. Would it be better to do this as a brief mending before that? > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Labels: perfomance > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, > ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163523#comment-13163523 ] jirapos...@reviews.apache.org commented on HIVE-2340: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2991/ --- (Updated 2011-12-06 11:02:00.777597) Review request for hive and Carl Steinbach. Changes --- I've overrode existing optimizer by mistake. Fixed it. Summary --- Mostly copied from existing code. Not tested intensively yet, but it is seemed to be used frequently for us. This addresses bug HIVE-2340. https://issues.apache.org/jira/browse/HIVE-2340 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 82a141d ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q PRE-CREATION ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out PRE-CREATION Diff: https://reviews.apache.org/r/2991/diff Testing --- new test cases added : reduce_deduplicate_extended.q Thanks, Navis > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Priority: Minor > Attachments: HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162689#comment-13162689 ] Navis commented on HIVE-2340: - https://reviews.apache.org/r/2991/ > optimize orderby followed by a groupby > -- > > Key: HIVE-2340 > URL: https://issues.apache.org/jira/browse/HIVE-2340 > Project: Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Navis >Priority: Minor > Attachments: HIVE-2340.1.patch.txt > > > Before implementing optimizer for JOIN-GBY, try to implement RS-GBY > optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira