[ 
https://issues.apache.org/jira/browse/HIVE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Klemke updated HIVE-24033:
------------------------------------
    Attachment: failing_query_plan.txt

> full outer join returns wrong number of results if 
> hive.optimize.joinreducededuplication is enabled
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24033
>                 URL: https://issues.apache.org/jira/browse/HIVE-24033
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sebastian Klemke
>            Priority: Major
>         Attachments: failing_query_plan.txt
>
>
> We encountered a hive query that returns incorrect results, when joining two 
> CTEs on a group by value. The input tables `id_table` and
> `reference_table` are unfortunately too large to share and on smaller tables 
> we have not been able to reproduce.
> {code}
> WITH ids AS (
>     SELECT
>         record.id AS id
>     FROM
>         `id_table`
>     LATERAL VIEW explode(records) r AS record
>     WHERE
>         record.id = '5ef0bad74d325f72f0360c19'
>     LIMIT 1
> ),
> refs AS (
>     SELECT
>         reference['id'] AS referenceId
>     FROM
>         `reference_table`
>     WHERE
>       partition_date = '2020-06-24'
>         AND type = '1b0e9eb5c492d1859815410253dd79b5'
>         AND reference['id'] = '5ef0bad74d325f72f0360c19'
>     GROUP BY
>         reference['id']
> )
> SELECT
>     l.id AS id
>     , r.referenceId AS referenceId
> FROM 
>     ids l
> FULL OUTER JOIN
>     refs r
> ON
>     l.id = r.referenceId
> {code}
> This returns 2 rows, because the join clause misses: 
> {code}
> OK
> 5ef0bad74d325f72f0360c19        NULL
> NULL    5ef0bad74d325f72f0360c19
> {code}
> Instead, a single row should be returned. The correct behavior can be 
> achieved by either 
>  * calling lower() on the refs group by statement (doesn't change the string 
> contents)
>  * setting hive.optimize.joinreducededuplication=false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to