[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-22 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637860#comment-13637860
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  ql/src/test/queries/clientpositive/reduce_deduplicate.q:26 can you add more 
comments here ??

  who sets trustScript ?
  As a user, how can i do that
  What is supposed to happen.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:124
 Can you add more comments here ?

  Is minReducer only set for order By ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-22 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637847#comment-13637847
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q:5 can you 
add some comments in this test ???

  RS + GBY + RS is optimized to remove the last RS etc.
  before the case

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-18 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636073#comment-13636073
 ] 

Phabricator commented on HIVE-2340:
---

navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122
 When ScriptOperator exists between RSs, it might possible to dedup only if the 
script does not change schema, order of rows and values of the RS related 
columns. It seemed added for that case by He Yongqiang, initial developer of 
this optimizer.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103
 Added comments
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359
 ok.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181
 done.

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-18 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635314#comment-13635314
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181
 nit: spelling Abstract
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103
 The order in which the rules are specified matter, since in case of exact 
match for costs,
  the last rule is invoked.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122
 What are the semantics of trustScript ?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359
 can you add more comments ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626937#comment-13626937
 ] 

Hudson commented on HIVE-2340:
--

Integrated in Hive-trunk-h0.21 #2053 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2053/])
HIVE-2340 : optimize orderby followed by a groupby (Navis via Ashutosh 
Chauhan) (Revision 1465721)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465721
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/auto_join26.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate.q
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_cube1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_rollup1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626434#comment-13626434
 ] 

Hudson commented on HIVE-2340:
--

Integrated in Hive-trunk-hadoop2 #147 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/147/])
HIVE-2340 : optimize orderby followed by a groupby (Navis via Ashutosh 
Chauhan) (Revision 1465721)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1465721
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/auto_join26.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate.q
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_cube1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_rollup1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/infer_bucket_sort.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ppd2.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/join1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join3.q.xml


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
> HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
> HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
> HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624938#comment-13624938
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Also, HIVE-4302 is resolved now, so workaround put in by Gunther should now 
longer be required. 

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-07 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624794#comment-13624794
 ] 

Navis commented on HIVE-2340:
-

When auto.convert.join is true(default for now), RSDedup cannot try to merge 
following RS, effectively disabling the code part. I think physical planning 
should be postponed to completion time of each task. I'll update test results.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624314#comment-13624314
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Following tests failed on latest patch:
* TestCliDriver.auto_join26.q
* TestCliDriver.index_bitmap3.q
* TestCliDriver.index_bitmap_auto.q
* TestCliDriver.infer_bucket_sort.q
* TestCliDriver.ppd_gby_join.q
* TestCliDriver.semijoin.q
* TestCliDriver.union24.q
* TestCliDriver.cluster.q
* TestMinimrCliDriver.infer_bucket_sort_reducers_power_two.q
* TestParse.join1
* TestParse.join2
* TestParse.join3


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-05 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624131#comment-13624131
 ] 

Harish Butani commented on HIVE-2340:
-

The problem with PTF is in plan generation. In 
SemanticAnalyzer::genReduceSinkPlanForWindowing line 10778, we are setting the 
RowSchema of the ReduceSinkOp that precedes a PTF to be the same one as the Op 
that precedes it. In the e.g. that was failing the preceding Op was another 
PTFOp. The fix is to set both the RowSchema and RowResolver by not pointing to 
the 'input' Op's structures(same issue in genPTFPlanForComponentQuery). Will 
fix in the separate Jira. Will add back 
'schema.getSignature().remove(colInfo);' to ColumnPrunerProcFactory. Sorry 
about this.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623262#comment-13623262
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Cool. Running test on latest patch. Will commit if tests pass.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623113#comment-13623113
 ] 

Navis commented on HIVE-2340:
-

Got it. I've updated patch (added small comment on this). Thanks.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622946#comment-13622946
 ] 

Ashutosh Chauhan commented on HIVE-2340:


In PTF handling we do keep a copy of schema in Desc object to be used later at 
runtime. This could be improved. Me and Harish are exploring that. But I think 
Gunther's fix gets us around that, so for this patch I would recommend to go 
ahead with Gunther's fix, since this is hanging on for more than 6 months and 
no point in delaying it further. So, Navis if you are on board I will like to 
get .14 patch in trunk.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622927#comment-13622927
 ] 

Navis commented on HIVE-2340:
-

CP for RS is correctly modifies schema of RS, and it makes CP tests for PTF 
fail. But as suggested by Hagleitner, after making schema of RS intact even 
after CP, those tests are resulted to success. I believe there is some 
assumption on schema in CP for PTF.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622473#comment-13622473
 ] 

Ashutosh Chauhan commented on HIVE-2340:


[~navis] I am not sure which test case you have in mind, but I ran following 
test and it worked for me.
{code}
hive> create table t1 (a1 int, b1 string); 
hive> create table t2 (a1 int, b1 string);
hive> from (select sum(i) over (), s from over10k) tt insert overwrite table t1 
select * insert overwrite table t2 select * ;
hive> select * from t1 limit 3;
hive> select * from t2 limit 3; 
{code}
I got expected results. If you run explain on that query you will see it has 
PTFOperator. You can check windowing_udaf.q to see the schema and data for 
over10k table. Let me know which query have in mind, for which CP for PTF may 
be broken.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-04 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621953#comment-13621953
 ] 

Navis commented on HIVE-2340:
-

I think CP for PTF is broken. Especially for single sourced multi-insert query.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-03 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621819#comment-13621819
 ] 

Navis commented on HIVE-2340:
-

[~hagleitn] Thanks. I'll check that.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, 
> HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621677#comment-13621677
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Would love to see this make it in 0.11. [~navis] / [~hagleitn] Let me know if 
its ready to go in.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-04-03 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621675#comment-13621675
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

[~navis]: The patch doesn't apply cleanly anymore: Culprit is HIVE-4186, but 
the fix is no longer actually needed with your changes in 
ReduceSinkDeDuplication. 

After rebasing it, I found it to fail "ptf.q" - a new testcase for windowing. 
The problem seems to be that something in the ptf code is hanging on to the 
schema/signature of the reducesink. Before your change we cloned it, now you're 
modifying it. When I added code to clone the signature it passed all windowing 
tests.

Can you take a look?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-03-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600482#comment-13600482
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has accepted the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-03-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600413#comment-13600413
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

Ran unit tests as well. All passed. Still going through the review.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-03-03 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592046#comment-13592046
 ] 

Navis commented on HIVE-2340:
-

[~hagleitn]: It seemed affected by HIVE-948, which is recently committed. Thank 
for your help. I'll check that.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-03-03 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591997#comment-13591997
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

HIVE-2340.14.patch that is.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-03-03 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591995#comment-13591995
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

[~navis]: I've tried the latest patch (.13) and it doesn't apply cleanly 
anymore. I rebased it, but after doing so I got an NPE from auto_join19.q. 
Something must have changed in the meantime.

I was however able to get the previous version of the patch to apply and work 
with just a minor change. Tests pass for me on that one. I'll upload, in case 
you want to take a look. Feel free to ignore.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
> HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, 
> HIVE-2340.D1209.13.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576343#comment-13576343
 ] 

Navis commented on HIVE-2340:
-

[~hagleitn]: Yes, it was HIVE-2339 and I'm not sure that it's fixed. And.. 
recently added GBY related operations(multi-GBY, cube, etc) are not making 
mappings for key from start. So that should be fixed also.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-11 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576337#comment-13576337
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

[~navis]: Are you suggesting to just revert the changes in 
ColumnPrunerProcFactory? I've tried that, but when I do that 
reduce_deduplicate_extended.q is failing. I've started to debug into that 
problem - it seems backtracking fails at that point because it can't find the 
source of "KEY._col0" etc. Seems there should be entries for that in the 
colExprMap of the RS, but that's not there. Are you getting different results?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576309#comment-13576309
 ] 

Navis commented on HIVE-2340:
-

[~hagleitn]: Thanks for your help. 

Tests are failed by modifications of ColumnPrunerProcFactory, which was based 
on very old version of it. After removing that all tests have passed except 
infer_bucket_sort.q, which is affected by this optimization. Disabling the 
optimization made it work, too.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574224#comment-13574224
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

[~navis]: Disabling the join merge when auto.convert is on should work.

I ran unit tests for patch11 and for me these TestCliDriver tests are failing: 
auto_join13.q,auto_join26.q,infer_bucket_sort.q,join13.q,partition_wise_fileformat14.q


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, 
> HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
> testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-06 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572273#comment-13572273
 ] 

Phabricator commented on HIVE-2340:
---

navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138
 ok.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 I wish I could but CommonJoinResolver is a physical optimizer, which means 
there is no RS-RS operator tree which could me merged on that stage.

  I'm thinking of disabling this optimization if user configured 
hive.auto.convert.join=true or hive.auto.convert.join.noconditionaltask=true.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251
 I'll add more explanations on hive-default.xml.template
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 
For rules with same cost, DefaultRuleDispatcher selects last one, something 
like this,
  {code}
  if ((cost >= 0) && (cost <= minCost)) {
  minCost = cost;
  rule = r;
  }
  {code}
  So R2 will be selected.
  conf/hive-default.xml.template:1034 It's commented on 
https://issues.apache.org/jira/browse/HIVE-2340?focusedCommentId=13568361&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568361

  This optimization merges two RSs by moving key/parts/num-reducers of child-RS 
to parent-RS, which means if num-reducer of child-RS is fixed (order by or 
forced bucketing) and small, it can resulted to very slow, single MR. For 
preventing this, the configuration makes min threshold for applying this 
optimization. It's not good enough, but I cannot think of better idea.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572126#comment-13572126
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has commented on the revision "HIVE-2340 [jira] optimize orderby 
followed by a groupby".

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138
 HashSet?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251
 I think the "number of reducers" story deserves more comments (similar to what 
you've explained on the jira)
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 I think if you just run this optimization *after* CommonJoinResolver 
everything should be fine. It will either already have converted joins to 
mapjoins and this optimization won't apply or you still have a regular join and 
you can merge it without worrying about missing out on a mapjoin conversion. 
You could still have the "sorted" flag to express intent, but there isn't any 
optimization that will pull the rug out under you at the moment. Am I missing 
something?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571177#comment-13571177
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

  Do you think it might be a good idea to get HIVE-3972 first ?

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99 
Isn't it true that R1 and R2 will have the same cost for

  RS -> GBY --> anything --> RS ?

  If yes, how do you know which rule will be fired ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-05 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571168#comment-13571168
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

  A general question ???

  How does it work with hive.optimize.reducededuplication ?

INLINE COMMENTS
  conf/hive-default.xml.template:1034 Sorry for joining late: Can you explain 
this more clearly ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn, njain


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-05 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571130#comment-13571130
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

Ah, that's good info. Makes sense now. The patch is useful as is, but is the 
only way to actually optimize the groupby/orderby case to do the ratio thing as 
a conditional task? And if so would that be this or a follow up jira?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-04 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570813#comment-13570813
 ] 

Navis commented on HIVE-2340:
-

@Gunther Hagleitner: I also considered ratio thing, but number of reducers is 
calculated based on input size just before submitted to hadoop and cannot be 
known in optimizer layer. 
Except those special cases with order by and bucketing, number of reducers for 
both RS is -1. So generally speaking, it's safe.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-02-04 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570645#comment-13570645
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

[~navis]: I think in general the logic should be to copy numReducers from 
parent to child not the other way around. If hive makes a decent estimate of 
reducers for the parent, that's probably the number you want to carry into the 
combined reduce stage, because that means each reducer is doing the desired 
amount of work. Buckets and order by are the only special cases I can think of, 
where the number needs to be fixed.

For those special cases without knowing the cardinalities of join/group 
by/tables, it's indeed difficult to guess if the optimization should be on or 
off. However, what do you think of using a max ratio of parent reducers/child 
reducers instead of a fixed minimum number of reducers for the child? With a 
default of 4 maybe. I.e.: If there are less than 4 times as many reducers in 
the parent than in the child collapse (assuming another job will be more 
expensive than the lower number of reducers), else leave it alone. The 
optimization is only good if the input sizes of the child and parent reducers 
are similar and expressing this as a ratio of number of reducers is probably 
the closest we can get right now.

This would enable the optimization for a larger body of queries (small tables, 
single input split, empty group by expr, etc).

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-31 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568466#comment-13568466
 ] 

Phabricator commented on HIVE-2340:
---

navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed 
by a groupby".

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Yes, it is. 
There should be a better configuration if possible.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 Yes, should be 'fixed' as sorted state. I'll rename it.
  I think there should be a configuration preventing MAPJOINable JOIN from 
being merged with next RS. But seemed hard to know if it's mapjoinable. Any 
idea?
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 
Yes, typos. I'll remove this from patch.
  ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 ok.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-31 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568361#comment-13568361
 ] 

Navis commented on HIVE-2340:
-

@Gunther Hagleitner,
This optimization copies configuration of childRS to parentRS, which can lead 
to performance problems if childRS is run on single or small number of RSs 
(bucketing, etc), especially when parentRS is for JOIN or GBY. Thresholding it 
by "min.reducer" conf seemed not good enough. Do you have some better idea?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566902#comment-13566902
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

Partial review on phabricator. Biggest question is around 
"hive.optimize.reducededuplication.min.reducer". That basically disables the 
"orderby followed by groupby" optimization which was the original motivation 
for the jira. Navis, can you explain this some more?

Might be another ticket, but would it be possible to optimize group by/sort by 
as well with this?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566898#comment-13566898
 ] 

Phabricator commented on HIVE-2340:
---

hagleitn has commented on the revision "HIVE-2340 [jira] optimize orderby 
followed by a groupby".

  Partial review

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:521 Not sure why 
this is needed or why this defaults to 4. From comment below it seems this is 
just to avoid the single reducer order-by case for performance reasons, is that 
correct?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
 Is this required or extra protection? Comment at the top of the file says 
mapjoin optimization happens before this (and probably should for performance 
reasons). Also, if I understand it correctly "joinAndSort" might be a better 
name than "fixed". You're basically saying that if an optimization wants to 
change the join after this they need to make sure the ordering of the keys is 
preserved, right?
  
ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java:136 
seems orthogonal to this patch.
  ql/src/test/queries/clientpositive/reduce_deduplicate.q:7 There are not a lot 
of tests, for min.reducer=1. No order by case for instance. Maybe the 
reduce_deduplicate_extended.q should run with both default and min.reducer=1.

REVISION DETAIL
  https://reviews.facebook.net/D1209

To: JIRA, navis
Cc: hagleitn


> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566810#comment-13566810
 ] 

Gunther Hagleitner commented on HIVE-2340:
--

FYI: Ran all unit tests on patch .9. Failing tests are: 
groupby_distinct_samekey.q,join31.q,reduce_deduplicate_extended.q 
(TestCliDriver). Failures look like outdated golden files (explain output 
changed). Uploaded testclidriver.txt for reference.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
> HIVE-2340.D1209.9.patch
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-22 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560424#comment-13560424
 ] 

Navis commented on HIVE-2340:
-

These are not removed but just merged with other rules. For example RS-*-RS/GBY 
is currently handled by RS-*-RS which is now diverged by type of child of child 
RS. It's more simple and seemed safer because this optimizer removes or 
replaces child RS and following GBY if possible. But this means child RS can be 
removed/replaced before accessed by graph walker.

I've ran about twenty tests before submitting. I wish I had some time to run 
all pending tests but a little busy recently, sorry.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559914#comment-13559914
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Thanks, Navis for updating the patch. I haven't looked the patch in detail, but 
some comments:

* In the latest patch, you have removed rule {{RS%.*RS%GBY%}} and 
{{RS%GBY%.*RS%GBY%}} and have modified rule {{JOIN%.*RS%GBY%}} to 
{{JOIN%.*%RS%}} Can you shed some light on thinking behind picking these rules? 
Were those rules not stable or you think they are not useful?

* auto_join_26.q.out is incorrect, its generating wrong results. Looks like 
aggregation is not happening correctly.

* I haven't ran full suite, but queries groupby_grouping_sets5.q and 
smb_mapjoin_14.q are failing after applying this patch.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt, 
> HIVE-2340.D1209.6.patch
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-17 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556660#comment-13556660
 ] 

Yin Huai commented on HIVE-2340:


Yes, I agree.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556651#comment-13556651
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Thanks Yin for explaining. Your ASCII art helped in understanding the 
differences : ) I better understand the reason for the fake new operator now. I 
think in cases you have pointed out when there is such kind of trees, this 
reduce deduplication approach won't help, since it looks at linear chain of RS 
and eliminates the one where it could. You would need a fake operator in such 
case because you don't want to modify the GBY or Join operators which make 
sense. I see the merits of Ysmart better now.

Though, on the other hand patch on this jira is still useful and complementary 
to ysmart. Since, it will collapse linear RS, instead of adding fake ones. In 
addition to collapsing of those operators, it will also make the life of ysmart 
easier because than ysmart will be dealing with simpler plans with reduce sinks 
already deduplicated. We need to make sure reducededup rule fires before ysmart 
for both optimizations to play nicely. So, I think we should make progress on 
both these patches.

[~navis] Will you like to refresh this patch?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-17 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556620#comment-13556620
 ] 

Yin Huai commented on HIVE-2340:


The current implementation of the patch of YSmart covers scenarios when a join 
or aggregation operator share the same partition keys with its all parents 
(join or aggregation operators). 
For example, a single MR job will be generated if all operators in the 
following plan share the same partition keys.
{code}
JOIN
   \  
JOIN  
   /\  
GBY  \
  JOIN
 /
GBY--- -/
{code}


Also, it requires that the bottom join or aggregation operators which will be 
processed in the same MR job take input tables instead of intermediate tables. 
In future, it should be extended to cover scenarios that involve intermediate 
tables, that correlated operators share common partition keys (not exactly the 
same keys), and that a join or aggregation operator share common keys with some 
of its parents. 

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-17 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608
 ] 

Yin Huai commented on HIVE-2340:


Let me explain the reason that I introduced the fake RS operator instead of 
just removing the original RS. When I was developing the patch for 2206, I 
found that the aggregation operator (GBY) and the join operator (JOIN) use 
different logic on processing rows forwarded to it. Although they both buffer 
rows, a GBY determines if it need to forward results to its children in 
processOp. While, a JOIN replies on endGroup to know when it should forward 
results. When we have plans like GBY-GBY or JOIN-GBY, that difference on 
processing logic is fine. However, when we have plan like
{code}
GBYGBY
   \  \
JOINor JOIN
   /  /
GBYJOIN---
{code}
We need operators between the child JOIN and parent GBYs and JOINs to make sure 
JOIN process rows in a correct way. This is also the reason that in 
CorrelationLocalSimulativeReduceSinkOperator, it determines when to start the 
group of its children in processOp and leave a empty startGroup and endGroup.

Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs 
and JOINs which will be merged into the same Reduce phase. Since the input of 
the first operator in the Reduce side is in the format of [key, value, tag], so 
I use those fake RSs to generate rows in the same format.

But this part of work was implemented about almost 2 years ago. Definitely let 
me know if anything has been changed and this fake RS is no longer needed.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556513#comment-13556513
 ] 

Ashutosh Chauhan commented on HIVE-2340:


Yeah, correct JOIN-GBY and GBY-GBY are taken care of in ysmart also. Its the 
group-by followed by order-by case which is also of interest to me, which this 
already covers. 

Besides the scenario covered by these two patches, I am also comparing the 
approaches taken in these two. I have just briefly looked at this patch, but 
fundamental difference which I can make out in this approach Vs ysmart approach 
is that here RS is deduplicated that is completely removed from operator 
pipeline, wherever it could be (i.e. when keys of subsequent RS is superset of 
the earlier one) thus fusing multiple MR jobs. Ysmart on the other hand instead 
replaces the second RS with a new operator its introducing 
(LocalSimulatedReduceSink?) which fakes the RS but doesn't let the plan split 
in 2 MR jobs and thus generating one MR job. I haven't thought through 
completely on this, but on initial pass it seems like approach of this patch is 
better than ysmart because:
* Here you don't need a new operator.
* Here you are simplifying the plan by eliminating the operators as oppose to 
ysmart which is replacing the operator thereby increasing the complexity of 
plan (by having a new type of operator)
* In that new operator ysmart currently serializes and deserializes the data 
through that operator, thereby unnecessarily introducing performance penalty. 
Granted this could be improved, but problem doesn't exist in patch proposed on 
this jira to begin with. 

Though there are certainly other scenarios which ysmart can cover (Yin, can you 
list those) which this patch is not covering, but for the scenarios that are 
common this approach seems to be better. 

There might be other differences in the approach, please feel free to raise 
those.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-16 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555803#comment-13555803
 ] 

Yin Huai commented on HIVE-2340:


[~ashutoshc] Can you let me know cases of JOIN-GROUPBY and GBY-GBY which 2206 
does not optimize? Also, does RS refer to those queries with "cluster by", 
"sort by" or "distributed by"? Thanks.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2013-01-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1373#comment-1373
 ] 

Ashutosh Chauhan commented on HIVE-2340:


[~navis] I played with HIVE-2206 over last couple of days and I think your work 
is complimentary and useful because there are several cases which this patch is 
already optimizing which HIVE-2206 doesn't currently. For e.g., order-by 
followed by groupby, RS followed by GBY, JOIN-GROUPBY, RS-RS, GBY-GBY. This 
covers a lot of ground, so I suggest we move forward with this.

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2012-12-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541214#comment-13541214
 ] 

Navis commented on HIVE-2340:
-

@Ashutosh
I've abandoned working on this cause HIVE-2206 is superset of it. Would it be 
better to do this as a brief mending before that?

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
>  Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2011-12-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163523#comment-13163523
 ] 

jirapos...@reviews.apache.org commented on HIVE-2340:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2991/
---

(Updated 2011-12-06 11:02:00.777597)


Review request for hive and Carl Steinbach.


Changes
---

I've overrode existing optimizer by mistake. Fixed it.


Summary
---

Mostly copied from existing code. Not tested intensively yet, but it is seemed 
to be used frequently for us.


This addresses bug HIVE-2340.
https://issues.apache.org/jira/browse/HIVE-2340


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
82a141d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java 
e91b4d5 
  ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q PRE-CREATION 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/2991/diff


Testing
---

new test cases added : reduce_deduplicate_extended.q


Thanks,

Navis



> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Priority: Minor
> Attachments: HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby

2011-12-05 Thread Navis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162689#comment-13162689
 ] 

Navis commented on HIVE-2340:
-

https://reviews.apache.org/r/2991/

> optimize orderby followed by a groupby
> --
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Navis
>Priority: Minor
> Attachments: HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
> optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira