[
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919023#action_12919023
]
Namit Jain commented on HIVE-474:
---------------------------------
Once HIVE-537 is committed, the general idea is as listed in the example in
HIVE-537.
Say, the query is:
select a, count(distinct b), count(distinct c) from T group by a
and the data is:
a1 b1 c1
a1 b1 c2
a1 b2 c2
a1 b2 c1
a2 ...
Mapper will emit a union type:
a1 0:b1
a1 1:c1
a1 0:b1
a1 1:c2
a1 0:b2
a1 1:c2
a1 0:b2
a1 1:c1
Since the sort key is (a, union_tag, (b|c))
The data will come to the reducer in the following order:
a1 0:b1
a1 0:b1
a1 0:b2
a1 0:b2
a1 1:c1
a1 1:c1
a1 1:c2
a1 1:c2
and then the reducer can stream the distincts
> Support for distinct selection on two or more columns
> -----------------------------------------------------
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Alexis Rondeau
> Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch
>
>
> The ability to select distinct several, individual columns as by example:
> select count(distinct user), count(distinct session) from actions;
> Currently returns the following failure:
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
> not Supported user
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.