[Hive] group by over a subquery with a cluster by not optimized
---------------------------------------------------------------
Key: HADOOP-4415
URL: https://issues.apache.org/jira/browse/HADOOP-4415
Project: Hadoop Core
Issue Type: Bug
Components: contrib/hive
Reporter: Namit Jain
Assignee: Namit Jain
Consider the following
select x.a, count(x.b) from (select ...... cluster by a) x group by x.a
Even though the user has specifically asked to cluster by a, the group by will
again run 2 map-reduce jobs,
sorting by a random number and a in that order. So, there will be a total of 3
map-reduce jobs sorting
by a, random and a respectively - this should be optimized
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.