[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886428#action_12886428 ]
John Sichi commented on HIVE-287: --------------------------------- Regarding DISTINCT: I agree with Arvind; this information should be provided to the UDAF so that it can reject invocations that don't make sense. Once this validation is passed, the distinct elimination is still implemented generically inside of Hive (upstream of the UDAF). Regarding F(*): let's discriminate three cases. COUNT(*): this really means COUNT(), not COUNT(x,y,z). This is a very important distinction to make from an optimizer perspective, because we want to be able to push down projection to avoid I/O and other processing for columns whose values we will never look at. SUM(*) and similar ones: these we should disallow. MY_UDAF(*), or MY_UDAF(t.*): this is similar to Pradeep's case that came up recently on the mailing list, and it needs to expand to MY_UDAF(x,y,z), not MY_UDAF(). I think the patch is currently doing MY_UDAF(), which isn't what he wants. My recommendation is that we commit Arvind's patch as is, then create a followup JIRA issue to do what Pradeep is looking for (the expansion of * in the semantic analyzer) for both UDF and UDAF, but with a special case for COUNT. UDAF authors will be able to decide whether or not to reject the star syntax, since in the common case of a UDAF expecting a limited number of parameters, the star won't make sense. > count distinct on multiple columns does not work > ------------------------------------------------ > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Namit Jain > Assignee: Arvind Prabhakar > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, > HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.