subject:"\[jira\] \[Commented\] \(HIVE\-3852\) Multi\-groupby optimization fails when same distinct column is used twice or more"

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556634#comment-13556634
 ] 

Hudson commented on HIVE-3852:
--

Integrated in Hive-trunk-h0.21 #1919 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1919/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556585#comment-13556585
 ] 

Hudson commented on HIVE-3852:
--

Integrated in Hive-trunk-hadoop2 #70 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/70/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556086#comment-13556086
 ] 

Hudson commented on HIVE-3852:
--

Integrated in hive-trunk-hadoop1 #20 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/20/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555908#comment-13555908
 ] 

Namit Jain commented on HIVE-3852:
--

+1

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555904#comment-13555904
 ] 

Namit Jain commented on HIVE-3852:
--

OK, I agree.
We may have a scenario, in which this is useful.

I will review.

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555191#comment-13555191
 ] 

Ashutosh Chauhan commented on HIVE-3852:


Namit,
bq. Should we have this optimization now ?
I am not sure which particular optimization you are referring to. I assume you 
mean there is no need for reduce-side groupbys anymore, since we have map-side 
aggregates. If so, I think those are still required. As Navis, pointed out if 
reduction ratio is not high enough, mappers may run out of memory and than we 
suggest users to turn-off map-side aggregation.


> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-14 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553336#comment-13553336
 ] 

Navis commented on HIVE-3852:
-

Namit, 
I don't think I'm right person to answer it but IMHO, it would be dependent to 
reduction ratio by map aggregation. If group by column is rather distinctive, 
this optimization could useful but if it's not, two (or more) MR tasks would be 
faster. 

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545819#comment-13545819
 ] 

Namit Jain commented on HIVE-3852:
--

[~navis], I had a higher level question.
Should we have this optimization now ?
I mean, is this really needed with map-side aggregates, or can we remove this 
code completely ?

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

8 matches

Site Navigation

Mail list logo

Footer information