[jira] [Commented] (DRILL-2543) Correlated subquery where outer table contains NULL values returns seemingly wrong result

Jinfeng Ni (JIRA) Thu, 09 Jul 2015 17:12:23 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621495#comment-14621495
 ]


Jinfeng Ni commented on DRILL-2543:
-----------------------------------

Looks like now the query will hit a different error in Calcite code ( I could 
re-produce this on calcite master branch as well).

{code}
select t1.c1, count(*) from t1 where t1.b1 not in (select b2 from t2 where 
t1.a1 = t2.a2) group by t1.c1 order by t1.c1;
Error: SYSTEM ERROR: ClassCastException: 
org.apache.calcite.rel.logical.LogicalFilter cannot be cast to 
org.apache.calcite.rel.core.Join
{code}

I'm going to file a JIRA on Calcite side. Meanwhile, I'm going to change the 
target to 1.3 and change it to Major.




> Correlated subquery where outer table contains NULL values returns  seemingly 
> wrong result
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2543
>                 URL: https://issues.apache.org/jira/browse/DRILL-2543
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | 2015-03-01 | aaaaa      |
> | 2          | 2015-03-02 | bbbbb      |
> | null       | null       | null       |
> +------------+------------+------------+
> 3 rows selected (0.064 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+------------+
> |     a2     |     b2     |     c2     |
> +------------+------------+------------+
> | 5          | 2017-03-01 | a          |
> +------------+------------+------------+
> 1 row selected (0.07 seconds)
> 0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in 
> (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1;
> +------------+------------+
> |     c1     |   EXPR$1   |
> +------------+------------+
> | aaaaa      | 1          |
> | bbbbb      | 1          |
> +------------+------------+
> 2 rows selected (0.32 seconds)
> {code}
> Postgres returns row from the outer table where a1 is null.
> This is part that I don't understand, because join condition in the subquery 
> should have eliminated row where a1 IS NULL. To me Drill result looks 
> correct. Unless there is something different in correlated comparison 
> semantics that I'm not aware of.
> {code}
> postgres=# select * from t1;
>  a1 |     b1     |  c1
> ----+------------+-------
>   1 | 2015-03-01 | aaaaa
>   2 | 2015-03-02 | bbbbb
>     |            |
> (3 rows)
> {code}
> Explain plan for the query:
> {code}
> 00-01      Project(c1=[$0], EXPR$1=[$1])
> 00-02        StreamAgg(group=[{0}], EXPR$1=[COUNT()])
> 00-03          Sort(sort0=[$0], dir0=[ASC])
> 00-04            Project(c1=[$0])
> 00-05              SelectionVectorRemover
> 00-06                Filter(condition=[NOT(IS TRUE($3))])
> 00-07                  HashJoin(condition=[=($1, $2)], joinType=[left])
> 00-09                    Project($f1=[$0], $f3=[$2])
> 00-11                      SelectionVectorRemover
> 00-13                        Filter(condition=[IS NOT NULL($1)])
> 00-15                          Project(c1=[$1], b1=[$0], a1=[$2])
> 00-17                            Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, 
> numFiles=1, columns=[`c1`, `b1`, `a1`]]])
> 00-08                    Project($f02=[$1], $f2=[$2])
> 00-10                      StreamAgg(group=[{0, 1}], agg#0=[MIN($2)])
> 00-12                        Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
> dir1=[ASC])
> 00-14                          Project($f0=[$1], $f02=[$2], $f1=[true])
> 00-16                            HashJoin(condition=[=($2, $0)], 
> joinType=[inner])
> 00-18                              StreamAgg(group=[{0}])
> 00-20                                Sort(sort0=[$0], dir0=[ASC])
> 00-22                                  Project($f0=[$1])
> 00-23                                    SelectionVectorRemover
> 00-24                                      Filter(condition=[IS NOT NULL($0)])
> 00-25                                        Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, 
> numFiles=1, columns=[`b1`, `a1`]]])
> 00-19                              Project(a2=[$1], b2=[$0])
> 00-21                                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t2]], selectionRoot=/test/t2, 
> numFiles=1, columns=[`a2`, `b2`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2543) Correlated subquery where outer table contains NULL values returns seemingly wrong result

Reply via email to