[ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman closed DRILL-2247. ----------------------------------- > SUM with CASE statement on the column of the null producing side of left > outer join returns wrong result > -------------------------------------------------------------------------------------------------------- > > Key: DRILL-2247 > URL: https://issues.apache.org/jira/browse/DRILL-2247 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Victoria Markman > Fix For: 1.1.0 > > Attachments: x1.parquet, x2.parquet > > > {code} > 0: jdbc:drill:schema=dfs> select * from x1; > +------------+ > | c1 | > +------------+ > | 1032.6516 | > +------------+ > 1 row selected (0.093 seconds) > 0: jdbc:drill:schema=dfs> select * from x2; > +------------+ > | c1 | > +------------+ > | 1057.3117 | > | 1090.8299 | > +------------+ > 2 rows selected (0.085 seconds) > {code} > Correct result is NULL: > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > sum(x2.c1) > as simple_sum, > . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else > round(x2.c1/12) end) as sum_with_case > . . . . . . . . . . . . > from > . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > x1.c1; > +------------+---------------+ > | simple_sum | sum_with_case | > +------------+---------------+ > | 0.0000 | null | > +------------+---------------+ > 1 row selected (0.173 seconds) > {code} > Query plan for correct result: > {code} > 00-01 Project(simple_sum=[$0], sum_with_case=[$1]) > 00-02 UnionExchange > 01-01 Project(simple_sum=[$1], sum_with_case=[$2]) > 01-02 HashAgg(group=[{0}], simple_sum=[SUM($1)], > sum_with_case=[SUM($2)]) > 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), > CAST(100):ANY NOT NULL, ROUND(/($1, 12)))]) > 01-04 HashJoin(condition=[=($0, $1)], joinType=[left]) > 01-06 HashToRandomExchange(dist0=[[$0]]) > 02-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], > selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]]) > 01-05 Project(c10=[$0]) > 01-07 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], > selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]]) > {code} > Wrong result: sum with case returns 0.0 instead of NULL > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > count(*) > as count_star, > . . . . . . . . . . . . > sum(x2.c1) > as simle_sum, > . . . . . . . . . . . . > round(avg(x2.c1)) > as round_avg_x2_c1, > . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else > round(x2.c1/12) end) as sum_with_case > . . . . . . . . . . . . > from > . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > x1.c1; > +------------+------------+-----------------+---------------+ > | count_star | simle_sum | round_avg_x2_c1 | sum_with_case | > +------------+------------+-----------------+---------------+ > | 1 | null | null | 0.0 | > +------------+------------+-----------------+---------------+ > {code} > Query plan for the wrong result query: > {code} > 00-01 Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], > sum_with_case=[$3]) > 00-02 UnionExchange > 01-01 Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], > round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY)], > sum_with_case=[$4]) > 01-02 HashAgg(group=[{0}], count_star=[COUNT()], > agg#1=[$SUM0($1)], agg#2=[COUNT($1)], agg#3=[$SUM0($2)]) > 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), > CAST(100):ANY NOT NULL, ROUND(/($1, 12)))]) > 01-04 HashJoin(condition=[=($0, $1)], joinType=[left]) > 01-06 HashToRandomExchange(dist0=[[$0]]) > 02-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], > selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]]) > 01-05 Project(c10=[$0]) > 01-07 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], > selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]]) > {code} > Cut/paste version of the query, comment out round function, query will return > correct result. > {code:sql} > select > count(*) as count_star, > sum(x2.c1) as simle_sum, > round(avg(x2.c1)) as round_avg_x2_c1, > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as > sum_with_case > from > x1 left outer join x2 on x1.c1 = x2.c1 > group by > x1.c1; > {code} > Not sure about how critical this bug is. Inconsistency is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)