[ https://issues.apache.org/jira/browse/DRILL-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman closed DRILL-2092. ----------------------------------- > Incorrect result with count distinct and sum aggregates > ------------------------------------------------------- > > Key: DRILL-2092 > URL: https://issues.apache.org/jira/browse/DRILL-2092 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Aman Sinha > Priority: Critical > Fix For: 0.8.0 > > > test.json > {code} > { "a1" : 10 , "b1" : 10 } > { "a1" : 20 , "b1" : 20 } > { "a1" : 20 , "b1" : 20} > { "a1" : 30 , "b1" : 30 } > { "a1" : null , "b1": null} > {code} > {code} > 0: jdbc:drill:schema=dfs> select a1, count(distinct a1) from `test.json` > group by a1; > +------------+------------+ > | a1 | EXPR$1 | > +------------+------------+ > | 10 | 1 | > | 20 | 1 | > | 30 | 1 | > | null | 0 | > +------------+------------+ > 4 rows selected (0.096 seconds) > {code} > If I add sum on the same column, I get wrong result (null group is gone): > {code} > 0: jdbc:drill:schema=dfs> select a1, count(distinct a1), sum(a1) from > `test.json` group by a1; > +------------+------------+------------+ > | a1 | EXPR$1 | EXPR$2 | > +------------+------------+------------+ > | 10 | 1 | 10 | > | 20 | 1 | 40 | > | 30 | 1 | 30 | > +------------+------------+------------+ > 3 rows selected (0.137 seconds) > {code} > Non-distinct count works correctly: > {code} > 0: jdbc:drill:schema=dfs> select a1, count(a1), sum(a1) from `test.json` > group by a1; > +------------+------------+------------+ > | a1 | EXPR$1 | EXPR$2 | > +------------+------------+------------+ > | 10 | 1 | 10 | > | 20 | 2 | 40 | > | 30 | 1 | 30 | > | null | 0 | null | > +------------+------------+------------+ > 4 rows selected (0.187 seconds) > {code} > Plan for the query with the wrong result: > {code} > 00-01 Project(a1=[$0], EXPR$1=[$1], EXPR$2=[$2]) > 00-02 Project(a1=[$0], EXPR$1=[$3], EXPR$2=[$1]) > 00-03 HashJoin(condition=[IS NOT DISTINCT FROM($0, $2)], > joinType=[inner]) > 00-05 HashAgg(group=[{0}], EXPR$2=[SUM($0)]) > 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test.json, > numFiles=1, columns=[`a1`], files=[maprfs:/test.json]]]) > 00-04 Project(a10=[$0], EXPR$1=[$1]) > 00-06 HashAgg(group=[{0}], EXPR$1=[COUNT($0)]) > 00-08 HashAgg(group=[{0}]) > 00-09 Scan(groupscan=[EasyGroupScan > [selectionRoot=/test.json, numFiles=1, columns=[`a1`], > files=[maprfs:/test.json]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)