[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556634#comment-13556634 ] Hudson commented on HIVE-3852: -- Integrated in Hive-trunk-h0.21 #1919 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1919/]) HIVE-3852 Multi-groupby optimization fails when same distinct column is used twice or more (Navis via namit) (Revision 1434600) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q * /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Fix For: 0.11.0 > > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556585#comment-13556585 ] Hudson commented on HIVE-3852: -- Integrated in Hive-trunk-hadoop2 #70 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/70/]) HIVE-3852 Multi-groupby optimization fails when same distinct column is used twice or more (Navis via namit) (Revision 1434600) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q * /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Fix For: 0.11.0 > > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556086#comment-13556086 ] Hudson commented on HIVE-3852: -- Integrated in hive-trunk-hadoop1 #20 (See [https://builds.apache.org/job/hive-trunk-hadoop1/20/]) HIVE-3852 Multi-groupby optimization fails when same distinct column is used twice or more (Navis via namit) (Revision 1434600) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q * /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Fix For: 0.11.0 > > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555908#comment-13555908 ] Namit Jain commented on HIVE-3852: -- +1 > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555904#comment-13555904 ] Namit Jain commented on HIVE-3852: -- OK, I agree. We may have a scenario, in which this is useful. I will review. > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555191#comment-13555191 ] Ashutosh Chauhan commented on HIVE-3852: Namit, bq. Should we have this optimization now ? I am not sure which particular optimization you are referring to. I assume you mean there is no need for reduce-side groupbys anymore, since we have map-side aggregates. If so, I think those are still required. As Navis, pointed out if reduction ratio is not high enough, mappers may run out of memory and than we suggest users to turn-off map-side aggregation. > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553336#comment-13553336 ] Navis commented on HIVE-3852: - Namit, I don't think I'm right person to answer it but IMHO, it would be dependent to reduction ratio by map aggregation. If group by column is rather distinctive, this optimization could useful but if it's not, two (or more) MR tasks would be faster. > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545819#comment-13545819 ] Namit Jain commented on HIVE-3852: -- [~navis], I had a higher level question. Should we have this optimization now ? I mean, is this really needed with map-side aggregates, or can we remove this code completely ? > Multi-groupby optimization fails when same distinct column is used twice or > more > > > Key: HIVE-3852 > URL: https://issues.apache.org/jira/browse/HIVE-3852 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-3852.D7737.1.patch > > > {code} > FROM INPUT > INSERT OVERWRITE TABLE dest1 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key > INSERT OVERWRITE TABLE dest2 > SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct > substr(INPUT.value,5)) GROUP BY INPUT.key; > {code} > fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira