[jira] [Updated] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join

2024-01-11 Thread Mergen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mergen updated HIVE-27997:
--
Description: 
Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 

  was:
Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:

 
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:

 
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 


> Incorrect result for Hive join query with NVL and Map Join
> --
>
> Key: HIVE-27997
> URL: https://issues.apache.org/jira/browse/HIVE-27997
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.3
>Reporter: Mergen
>Priority: Major
>
> Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
> enabled.
>  
> STEPS TO REPRODUCE:
> {code:java}
> Step 1: Create a table test_nvl
> create table test_nvl(a string);
> Step 2: Insert null and non-null data into table test_nvl
> insert into test_nvl values ('x'), ('y'), (null);
> select * from test_nvl;
> +-+
> | test_nvl.a  |
> +-+
> | x   |
> | y   |
> | NULL|
> +-+
> Step 3 : Execute the following query
> select x.a, y.a
> from test_nvl x
> left join test_nvl y
> on nvl(x.a, '') = nvl(y.a, '');{code}
>  
> EXPECTED RESULT:
> {code:java}
> +---+---+
> | x.a   | y.a   |
> +---+---+
> | x | x |
> | y | y |
> | NULL  | NULL  |
> +---+---+ {code}
>  
> ACTUAL RESULT:
> {code:java}
> +---+--+
> | x.a   | y.a  |
> +---+--+
> | x | x|
> | y | x|
> | NULL  | x|
> +---+--+{code}
> (Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)
>  
> The query works fine with Map Join disabled:
> {code:java}
> -- Using Merge Join instead.
> set hive.auto.convert.join=false;
> select x.a, y.a
> from test_nvl x
> left join test_nvl y
> on nvl(x.a, '') = nvl(y.a, '');
> +---+---+
> | x.a   | y.a   |
> +---+---+
> | NULL  | NULL  |
> | x | x |
> | y | y |
> +---+---+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27997) Incorrect result for Hive join query with NVL and Map Join

2024-01-11 Thread Mergen (Jira)
Mergen created HIVE-27997:
-

 Summary: Incorrect result for Hive join query with NVL and Map Join
 Key: HIVE-27997
 URL: https://issues.apache.org/jira/browse/HIVE-27997
 Project: Hive
  Issue Type: Bug
  Components: Operators
Affects Versions: 3.1.3
Reporter: Mergen


Hive returns incorrect result if there is NVL() in an ON clause with Map Join 
enabled.

 

STEPS TO REPRODUCE:

 
{code:java}
Step 1: Create a table test_nvl
create table test_nvl(a string);

Step 2: Insert null and non-null data into table test_nvl
insert into test_nvl values ('x'), ('y'), (null);
select * from test_nvl;
+-+
| test_nvl.a  |
+-+
| x   |
| y   |
| NULL|
+-+

Step 3 : Execute the following query
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');{code}
 

EXPECTED RESULT:
{code:java}
+---+---+
| x.a   | y.a   |
+---+---+
| x | x |
| y | y |
| NULL  | NULL  |
+---+---+ {code}
 

ACTUAL RESULT:
{code:java}
+---+--+
| x.a   | y.a  |
+---+--+
| x | x|
| y | x|
| NULL  | x|
+---+--+{code}
(Obviously 'y' != 'x' and NULL != 'x' so they should not be in the same line)

 

The query works fine with Map Join disabled:

 
{code:java}
-- Using Merge Join instead.
set hive.auto.convert.join=false;
select x.a, y.a
from test_nvl x
left join test_nvl y
on nvl(x.a, '') = nvl(y.a, '');
+---+---+
| x.a   | y.a   |
+---+---+
| NULL  | NULL  |
| x | x |
| y | y |
+---+---+ {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)