[
https://issues.apache.org/jira/browse/HIVE-25822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltan Haindrich updated HIVE-25822:
------------------------------------
Description:
needed
* outer join
* on condition has at least one condition for one side of the join
* in a single reducer:
** a right hand side only row outputted right before
** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS
doesn't satisfies the filter condition
** second LHS row with good filter condition
{code}
with
t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c),
t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c)
select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y');
{code}
null,null,1,y is a false positive result
{code}
+-------+-------+-------+-------+
| l.id | l.s | r.id | r.s |
+-------+-------+-------+-------+
| NULL | NULL | 0 | a |
| 1 | x | NULL | NULL |
| NULL | NULL | 1 | y |
| 1 | y | 1 | y |
+-------+-------+-------+-------+
{code}
was:
needed
* outer join
* on condition has at least one condition for one side of the join
* in a single reducer:
** a right hand side only row outputted right before
** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS
doesn't satisfies the filter condition
** second LHS row with good filter condition
{code}
with
t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c),
t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c)
select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y');
{code}
null,null,1,y is an unexpected result
{code}
+-------+-------+-------+-------+
| l.id | l.s | r.id | r.s |
+-------+-------+-------+-------+
| NULL | NULL | 0 | a |
| 1 | x | NULL | NULL |
| NULL | NULL | 1 | y |
| 1 | y | 1 | y |
+-------+-------+-------+-------+
{code}
Summary: False positive result rows may be outputter in case outer join
has conditions only affecting one side (was: Unexpected result rows in case of
outer join contains conditions only affecting one side)
bq. Why we are saying unexpected results instead of wrong? I don't thing there
are many ways to interpret the query.
I wanted to highlight that this is not taking away expected results - and thus
its a false positive error and not a false negative.
> False positive result rows may be outputter in case outer join has conditions
> only affecting one side
> -----------------------------------------------------------------------------------------------------
>
> Key: HIVE-25822
> URL: https://issues.apache.org/jira/browse/HIVE-25822
> Project: Hive
> Issue Type: Improvement
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> needed
> * outer join
> * on condition has at least one condition for one side of the join
> * in a single reducer:
> ** a right hand side only row outputted right before
> ** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS
> doesn't satisfies the filter condition
> ** second LHS row with good filter condition
> {code}
> with
> t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c),
> t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c)
> select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y');
> {code}
> null,null,1,y is a false positive result
> {code}
> +-------+-------+-------+-------+
> | l.id | l.s | r.id | r.s |
> +-------+-------+-------+-------+
> | NULL | NULL | 0 | a |
> | 1 | x | NULL | NULL |
> | NULL | NULL | 1 | y |
> | 1 | y | 1 | y |
> +-------+-------+-------+-------+
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)