[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515987#comment-14515987 ]
Victoria Markman commented on DRILL-2488: ----------------------------------------- {code} Verified fixed in 0.9.0 drillGitId=3689522d4a7035a966f19695a678c6881fdaeba6 {code} Test checked in under: Functional/Passing/aggregation/sanity/q18_1.sql > Wrong result on join between two subqueries with aggregation > ------------------------------------------------------------ > > Key: DRILL-2488 > URL: https://issues.apache.org/jira/browse/DRILL-2488 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Aman Sinha > Priority: Critical > Fix For: 0.8.0 > > Attachments: > 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet > > > {code} > 0: jdbc:drill:schema=dfs> select * from t1; > +------------+------------+------------+ > | a1 | b1 | c1 | > +------------+------------+------------+ > | 1 | aaaaa | 2015-01-01 | > | 2 | bbbbb | 2015-01-02 | > | 3 | ccccc | 2015-01-03 | > | 4 | null | 2015-01-04 | > | 5 | eeeee | 2015-01-05 | > | 6 | fffff | 2015-01-06 | > | 7 | ggggg | 2015-01-07 | > | null | hhhhh | 2015-01-08 | > | 9 | iiiii | null | > | 10 | jjjjj | 2015-01-10 | > +------------+------------+------------+ > 10 rows selected (0.15 seconds) > {code} > This result is incorrect, one row is missing > {code} > 0: jdbc:drill:schema=dfs> select * from > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq1(x1, y1) > . . . . . . . . . . . . > > . . . . . . . . . . . . > inner join > . . . . . . . . . . . . > > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq2(x1, y1) > . . . . . . . . . . . . > on > . . . . . . . . . . . . > sq1.x1 = sq2.x1 and > . . . . . . . . . . . . > sq2.y1 = sq2.y1 > . . . . . . . . . . . . > ; > +------------+------------+------------+------------+ > | x1 | y1 | x10 | y10 | > +------------+------------+------------+------------+ > | bbbbb | 1 | bbbbb | 1 | > | ccccc | 1 | ccccc | 1 | > | eeeee | 1 | eeeee | 1 | > | fffff | 1 | fffff | 1 | > +------------+------------+------------+------------+ > 4 rows selected (0.28 seconds) > {code} > Explain plan for the wrong result: > {code} > 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) > 00-02 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) > 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) > 00-05 Limit(offset=[1], fetch=[5]) > 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) > 00-09 Sort(sort0=[$0], dir0=[ASC]) > 00-11 StreamAgg(group=[{0, 1}]) > 00-13 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) > 00-15 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], > selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, > `a1`]]]) > 00-04 Project(b10=[$0], EXPR$10=[$1]) > 00-06 SelectionVectorRemover > 00-08 Sort(sort0=[$0], dir0=[ASC]) > 00-10 Filter(condition=[=($1, $1)]) > 00-12 Limit(offset=[1], fetch=[5]) > 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) > 00-16 Sort(sort0=[$0], dir0=[ASC]) > 00-17 StreamAgg(group=[{0, 1}]) > 00-18 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], > dir1=[ASC]) > 00-19 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], > selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, > `a1`]]]) > {code} > If you turn off merge join, query returns correct result: > {code} > 0: jdbc:drill:schema=dfs> select * from > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq1(x1, y1) > . . . . . . . . . . . . > > . . . . . . . . . . . . > inner join > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq2(x1, y1) > . . . . . . . . . . . . > on > . . . . . . . . . . . . > sq1.x1 = sq2.x1 and > . . . . . . . . . . . . > sq2.y1 = sq2.y1 > . . . . . . . . . . . . > ; > +------------+------------+------------+------------+ > | x1 | y1 | x10 | y10 | > +------------+------------+------------+------------+ > | bbbbb | 1 | bbbbb | 1 | > | ccccc | 1 | ccccc | 1 | > | eeeee | 1 | eeeee | 1 | > | fffff | 1 | fffff | 1 | > | ggggg | 1 | ggggg | 1 | > +------------+------------+------------+------------+ > 5 rows selected (0.352 seconds) > {code} > cut/paste reproduction > {code:sql} > select * from > ( > select > b1, > count(distinct a1) > from > t1 > group by > b1 > order by > b1 limit 5 offset 1 > ) as sq1(x1, y1) > > inner join > > ( > select > b1, > count(distinct a1) > from > t1 > group by > b1 > order by > b1 limit 5 offset 1 > ) as sq2(x1, y1) > on > sq1.x1 = sq2.x1 and > sq2.y1 = sq2.y1 > ; > {code} > This test has been running and passing in regression test suite until > framework was switched to a new code, where JSON parsing was replaced with > jackson and for a brief period ( I believe Friday afternoon until now ) this > suite was not executed. > We already have a merge join bug DRILL-2010, but this one seems to be > different (small data set) and feels like a recent regression. > For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running > with merge join disabled until this issue is fixed. Will need to remove alter > session from this file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)