[ https://issues.apache.org/jira/browse/HIVE-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141708#comment-16141708 ]
Khaja Hussain commented on HIVE-17390: -------------------------------------- Thanks Brian for filing the bug. > Select count(distinct) returns incorrect results using tez > ---------------------------------------------------------- > > Key: HIVE-17390 > URL: https://issues.apache.org/jira/browse/HIVE-17390 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 1.2.1 > Reporter: Brian Goerlitz > > With the following combination of settings, select count(distinct) will > return the results of select sum(distinct). > hive.execution.engine=tez > hive.optimize.reducededuplication=true > hive.optimize.reducededuplication.min.reducer=1 > hive.optimize.distinct.rewrite=true > hive.groupby.skewindata=false > hive.vectorized.execution.reduce.enabled=true > STEPS TO REPRODUCE: > {quote}CREATE TABLE `simple_data`(ppmonth int, sale double); > INSERT INTO simple_data VALUES > (501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0); > set hive.execution.engine=tez; > set hive.optimize.reducededuplication=true; > set hive.optimize.reducededuplication.min.reducer=1; > set hive.optimize.distinct.rewrite=true; > set hive.groupby.skewindata=false; > set hive.vectorized.execution.reduce.enabled=true; > select count(distinct ppmonth) from simple_data;{quote} > Returns 1003 rather than 2 -- This message was sent by Atlassian JIRA (v6.4.14#64029)