[ 
https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188819#comment-17188819
 ] 

Tim Armstrong commented on IMPALA-4373:
---------------------------------------

I think we need to detect when the subquery is inside an expression that does 
not preserve nulls.

Probably need to do a whitelist of exprs that are null-preserving to be safe, 
i.e. where an input null in any argument results in an output null. Then treat 
subqueries that are nested inside exprs that are not known to be 
null-preserving in the same way as other disjunctive queries.

This would mean that until IMPALA-9933 was fixed, that we wouldn't necessarily 
handle subqueries with GROUP BY.

> Wrong results with correlated WHERE-clause subquery inside a NULL-checking 
> conditional function.
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-4373
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4373
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0
>            Reporter: Alexander Behm
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: correctness
>
> Impala may generate an incorrect plan for queries that have a correlated 
> scalar subquery as a parameter to a NULL-checking conditional function like 
> ISNULL().
> Example query and incorrect plan:
> {code}
> select t1.int_col
> from functional.alltypessmall as t1
> where t1.int_col >= isnull
> (
>    (
>     SELECT 
>      MAX(t2.bigint_col)
>     FROM 
>      functional.alltypestiny AS t2 
>     WHERE 
>      t1.id = t2.id + 10000
>     ),
>    0  
> )
> Fetched 0 row(s) in 1.09s
> Single-node plan:
> +-----------------------------------------------------------------------+
> | Explain String                                                        |
> +-----------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=0B VCores=0                   |
> |                                                                       |
> | PLAN-ROOT SINK                                                        |
> | |                                                                     |
> | 03:HASH JOIN [LEFT SEMI JOIN]                                         |
> | |  hash predicates: t1.id = t2.id + 10000                             |
> | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
> | |  runtime filters: RF000 <- t2.id + 10000                            |
> | |                                                                     |
> | |--02:AGGREGATE [FINALIZE]                                            |
> | |  |  output: max(t2.bigint_col)                                      |
> | |  |  group by: t2.id                                                 |
> | |  |                                                                  |
> | |  01:SCAN HDFS [functional.alltypestiny t2]                          |
> | |     partitions=4/4 files=4 size=460B                                |
> | |                                                                     |
> | 00:SCAN HDFS [functional.alltypessmall t1]                            |
> |    partitions=4/4 files=4 size=6.32KB                                 |
> |    runtime filters: RF000 -> t1.id                                    |
> +-----------------------------------------------------------------------+
> {code}
> The query returns an empty result set but instead should return all rows from 
> t1 because all invocations of the subquery return NULL, and all rows from t1 
> satisfy "t1.int_col >= 0".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to