[ 
https://issues.apache.org/jira/browse/IMPALA-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-8386:
-----------------------------------
    Description: 
skyyws <sky...@163.com> reported a bug [in the mailing 
list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E]
 on the following data set:
{code:java}
table A
+------+
| a_id |
+------+
| 1    |
| 2    |
+------+
table B
+------+--------+
| b_id | amount |
+------+--------+
| 1    | 10     |
| 1    | 20     |
| 2    | NULL   |
+------+--------+
table C
+------+------+
| a_id | b_id |
+------+------+
| 1    | 1    |
| 2    | 2    |
+------+------+{code}
The following query returns a wrong result "1":
{code:java}
select count(1) from (
    select t2.a_id,t2.amount1,t2.amount2
    from( select a_id from a) t1
    left outer join (
        select c.a_id,sum(amount) as amount1,sum(amount) as amount2
        from b join c  on b.b_id = c.b_id group by c.a_id) t2
    on t1.a_id = t2.a_id
) t;
{code}
Removing "t2.amount2" can get the right result "2":
{code:java}
select count(1) from (
    select t2.a_id,t2.amount1
    from( select a_id from a) t1
    left outer join (
        select c.a_id,sum(amount) as amount1,sum(amount) as amount2
        from b join c  on b.b_id = c.b_id group by c.a_id) t2
    on t1.a_id = t2.a_id
) t;
{code}
The problem is that in query 1, Impala generates a wrong predicate "sum(amount) 
= sum(amount)" which rejects nulls:
{code:java}
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
| Per-Host Resource Estimates: Memory=143MB                   |
| Codegen disabled by planner                                 |
|                                                             |
| PLAN-ROOT SINK                                              |
| |                                                           |
| 12:AGGREGATE [FINALIZE]                                     |
| |  output: count:merge(*)                                   |
| |  row-size=8B cardinality=1                                |
| |                                                           |
| 11:EXCHANGE [UNPARTITIONED]                                 |
| |                                                           |
| 06:AGGREGATE                                                |
| |  output: count(*)                                         |
| |  row-size=8B cardinality=1                                |
| |                                                           |
| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
| |  hash predicates: c.a_id = a_id                           |
| |  other predicates: sum(amount) = sum(amount)      <---------- Wrong 
inferred predicate which incorrectly reject nulls
| |  runtime filters: RF000 <- a_id                           |
| |  row-size=16B cardinality=2                               |
| |                                                           |
| |--10:EXCHANGE [HASH(a_id)]                                 |
| |  |                                                        |
| |  00:SCAN HDFS [default.a]                                 |
| |     partitions=1/1 files=1 size=4B                        |
| |     row-size=4B cardinality=2                             |
| |                                                           |
| 09:AGGREGATE [FINALIZE]                                     |
| |  output: sum:merge(amount)                                |
| |  group by: c.a_id                                         |
| |  row-size=12B cardinality=2                               |
| |                                                           |
| 08:EXCHANGE [HASH(c.a_id)]                                  |
| |                                                           |
| 04:AGGREGATE [STREAMING]                                    |
| |  output: sum(amount)                                      |
| |  group by: c.a_id                                         |
| |  row-size=12B cardinality=2                               |
| |                                                           |
| 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
| |  hash predicates: b.b_id = c.b_id                         |
| |  runtime filters: RF002 <- c.b_id                         |
| |  row-size=16B cardinality=3                               |
| |                                                           |
| |--07:EXCHANGE [BROADCAST]                                  |
| |  |                                                        |
| |  02:SCAN HDFS [default.c]                                 |
| |     partitions=1/1 files=1 size=8B                        |
| |     runtime filters: RF000 -> default.c.a_id              |
| |     row-size=8B cardinality=2                             |
| |                                                           |
| 01:SCAN HDFS [default.b]                                    |
|    partitions=1/1 files=1 size=15B                          |
|    runtime filters: RF002 -> b.b_id                         |
|    row-size=8B cardinality=3                                |
+-------------------------------------------------------------+
{code}

  was:
skyyws <sky...@163.com> reported a bug [in the mailing 
list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E]
 on the following data set:
{code:java}
table A
+------+
| a_id |
+------+
| 1    |
| 2    |
+------+
table B
+------+--------+
| b_id | amount |
+------+--------+
| 1    | 10     |
| 1    | 20     |
| 2    | NULL   |
+------+--------+
table C
+------+------+
| a_id | b_id |
+------+------+
| 1    | 1    |
| 2    | 2    |
+------+------+{code}
The following query returns a wrong result "1":
{code:java}
select count(1) from (
    select t2.a_id,t2.amount1,t2.amount2
    from( select a_id from a) t1
    left outer join (
        select c.a_id,sum(amount) as amount1,sum(amount) as amount2
        from b join c  on b.b_id = c.b_id group by c.a_id) t2
    on t1.a_id = t2.a_id
) t;
{code}
Removing "t2.amount2" can get the right result "2":
{code:java}
select count(1) from (
    select t2.a_id,t2.amount1
    from( select a_id from a) t1
    left outer join (
        select c.a_id,sum(amount) as amount1,sum(amount) as amount2
        from b join c  on b.b_id = c.b_id group by c.a_id) t2
    on t1.a_id = t2.a_id
) t;
{code}


> Incorrect predicate in a left outer join query
> ----------------------------------------------
>
>                 Key: IMPALA-8386
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8386
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.2.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: correctness
>
> skyyws <sky...@163.com> reported a bug [in the mailing 
> list|https://lists.apache.org/thread.html/0bdbbaa6bb35b552f050ae30587b7d75b78a72ec60007a8bc0a4a8a9@%3Cdev.impala.apache.org%3E]
>  on the following data set:
> {code:java}
> table A
> +------+
> | a_id |
> +------+
> | 1    |
> | 2    |
> +------+
> table B
> +------+--------+
> | b_id | amount |
> +------+--------+
> | 1    | 10     |
> | 1    | 20     |
> | 2    | NULL   |
> +------+--------+
> table C
> +------+------+
> | a_id | b_id |
> +------+------+
> | 1    | 1    |
> | 2    | 2    |
> +------+------+{code}
> The following query returns a wrong result "1":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1,t2.amount2
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> Removing "t2.amount2" can get the right result "2":
> {code:java}
> select count(1) from (
>     select t2.a_id,t2.amount1
>     from( select a_id from a) t1
>     left outer join (
>         select c.a_id,sum(amount) as amount1,sum(amount) as amount2
>         from b join c  on b.b_id = c.b_id group by c.a_id) t2
>     on t1.a_id = t2.a_id
> ) t;
> {code}
> The problem is that in query 1, Impala generates a wrong predicate 
> "sum(amount) = sum(amount)" which rejects nulls:
> {code:java}
> +-------------------------------------------------------------+
> | Explain String                                              |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
> | Per-Host Resource Estimates: Memory=143MB                   |
> | Codegen disabled by planner                                 |
> |                                                             |
> | PLAN-ROOT SINK                                              |
> | |                                                           |
> | 12:AGGREGATE [FINALIZE]                                     |
> | |  output: count:merge(*)                                   |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 11:EXCHANGE [UNPARTITIONED]                                 |
> | |                                                           |
> | 06:AGGREGATE                                                |
> | |  output: count(*)                                         |
> | |  row-size=8B cardinality=1                                |
> | |                                                           |
> | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
> | |  hash predicates: c.a_id = a_id                           |
> | |  other predicates: sum(amount) = sum(amount)      <---------- Wrong 
> inferred predicate which incorrectly reject nulls
> | |  runtime filters: RF000 <- a_id                           |
> | |  row-size=16B cardinality=2                               |
> | |                                                           |
> | |--10:EXCHANGE [HASH(a_id)]                                 |
> | |  |                                                        |
> | |  00:SCAN HDFS [default.a]                                 |
> | |     partitions=1/1 files=1 size=4B                        |
> | |     row-size=4B cardinality=2                             |
> | |                                                           |
> | 09:AGGREGATE [FINALIZE]                                     |
> | |  output: sum:merge(amount)                                |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 08:EXCHANGE [HASH(c.a_id)]                                  |
> | |                                                           |
> | 04:AGGREGATE [STREAMING]                                    |
> | |  output: sum(amount)                                      |
> | |  group by: c.a_id                                         |
> | |  row-size=12B cardinality=2                               |
> | |                                                           |
> | 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
> | |  hash predicates: b.b_id = c.b_id                         |
> | |  runtime filters: RF002 <- c.b_id                         |
> | |  row-size=16B cardinality=3                               |
> | |                                                           |
> | |--07:EXCHANGE [BROADCAST]                                  |
> | |  |                                                        |
> | |  02:SCAN HDFS [default.c]                                 |
> | |     partitions=1/1 files=1 size=8B                        |
> | |     runtime filters: RF000 -> default.c.a_id              |
> | |     row-size=8B cardinality=2                             |
> | |                                                           |
> | 01:SCAN HDFS [default.b]                                    |
> |    partitions=1/1 files=1 size=15B                          |
> |    runtime filters: RF002 -> b.b_id                         |
> |    row-size=8B cardinality=3                                |
> +-------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to