[ https://issues.apache.org/jira/browse/DRILL-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621495#comment-14621495 ]
Jinfeng Ni commented on DRILL-2543: ----------------------------------- Looks like now the query will hit a different error in Calcite code ( I could re-produce this on calcite master branch as well). {code} select t1.c1, count(*) from t1 where t1.b1 not in (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1; Error: SYSTEM ERROR: ClassCastException: org.apache.calcite.rel.logical.LogicalFilter cannot be cast to org.apache.calcite.rel.core.Join {code} I'm going to file a JIRA on Calcite side. Meanwhile, I'm going to change the target to 1.3 and change it to Major. > Correlated subquery where outer table contains NULL values returns seemingly > wrong result > ------------------------------------------------------------------------------------------ > > Key: DRILL-2543 > URL: https://issues.apache.org/jira/browse/DRILL-2543 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Priority: Critical > Fix For: 1.3.0 > > > {code} > 0: jdbc:drill:schema=dfs> select * from t1; > +------------+------------+------------+ > | a1 | b1 | c1 | > +------------+------------+------------+ > | 1 | 2015-03-01 | aaaaa | > | 2 | 2015-03-02 | bbbbb | > | null | null | null | > +------------+------------+------------+ > 3 rows selected (0.064 seconds) > 0: jdbc:drill:schema=dfs> select * from t2; > +------------+------------+------------+ > | a2 | b2 | c2 | > +------------+------------+------------+ > | 5 | 2017-03-01 | a | > +------------+------------+------------+ > 1 row selected (0.07 seconds) > 0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in > (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1; > +------------+------------+ > | c1 | EXPR$1 | > +------------+------------+ > | aaaaa | 1 | > | bbbbb | 1 | > +------------+------------+ > 2 rows selected (0.32 seconds) > {code} > Postgres returns row from the outer table where a1 is null. > This is part that I don't understand, because join condition in the subquery > should have eliminated row where a1 IS NULL. To me Drill result looks > correct. Unless there is something different in correlated comparison > semantics that I'm not aware of. > {code} > postgres=# select * from t1; > a1 | b1 | c1 > ----+------------+------- > 1 | 2015-03-01 | aaaaa > 2 | 2015-03-02 | bbbbb > | | > (3 rows) > {code} > Explain plan for the query: > {code} > 00-01 Project(c1=[$0], EXPR$1=[$1]) > 00-02 StreamAgg(group=[{0}], EXPR$1=[COUNT()]) > 00-03 Sort(sort0=[$0], dir0=[ASC]) > 00-04 Project(c1=[$0]) > 00-05 SelectionVectorRemover > 00-06 Filter(condition=[NOT(IS TRUE($3))]) > 00-07 HashJoin(condition=[=($1, $2)], joinType=[left]) > 00-09 Project($f1=[$0], $f3=[$2]) > 00-11 SelectionVectorRemover > 00-13 Filter(condition=[IS NOT NULL($1)]) > 00-15 Project(c1=[$1], b1=[$0], a1=[$2]) > 00-17 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, > numFiles=1, columns=[`c1`, `b1`, `a1`]]]) > 00-08 Project($f02=[$1], $f2=[$2]) > 00-10 StreamAgg(group=[{0, 1}], agg#0=[MIN($2)]) > 00-12 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], > dir1=[ASC]) > 00-14 Project($f0=[$1], $f02=[$2], $f1=[true]) > 00-16 HashJoin(condition=[=($2, $0)], > joinType=[inner]) > 00-18 StreamAgg(group=[{0}]) > 00-20 Sort(sort0=[$0], dir0=[ASC]) > 00-22 Project($f0=[$1]) > 00-23 SelectionVectorRemover > 00-24 Filter(condition=[IS NOT NULL($0)]) > 00-25 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, > numFiles=1, columns=[`b1`, `a1`]]]) > 00-19 Project(a2=[$1], b2=[$0]) > 00-21 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/test/t2]], selectionRoot=/test/t2, > numFiles=1, columns=[`a2`, `b2`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)