[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27324:

Flags:   (was: Important)

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: sql_queries.png
>
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); 
> Time taken: 17.724 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-11-03 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27324:

Fix Version/s: 4.0.0

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: sql_queries.png
>
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); 
> Time taken: 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-11-03 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-27324:

Affects Version/s: 3.1.3
   (was: 3.1.0)
   (was: All Versions)

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
> Attachments: sql_queries.png
>
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-09-16 Thread Diksha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diksha updated HIVE-27324:
--
Attachment: sql_queries.png

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions, 3.1.0
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
> Attachments: sql_queries.png
>
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); 
> Time taken: 17.724 seconds
> INFO  : OK
> 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27324:
--
Labels: pull-request-available  (was: )

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: All Versions, 3.1.0
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); 
> Time taken: 17.724 seconds
> INFO  : OK
> 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-05-06 Thread Shobika Selvaraj (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shobika Selvaraj updated HIVE-27324:

Description: 
Hive query giving empty results when the sub query table contains the null 
value. 

We encountered two issues here. 

1) The query - "select * from t3 where age not in (select distinct(age) age 
from t1);" is giving empty results when the table t1 contains a null value. 
Disabling cbo didn't helped here.

2) Let's consider the table t3 has null value and table t1 doesn't have any 
null values. Now if we run the above query it is returning other data's but not 
the null value from t3. If we disable the cbo then it's giving result with null 
value. 

 

*REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*

*FIRST ISSUE:*
Create two tables and insert data with null values as below:
---
create table t3 (id int,name string, age int);

 
insert into t3 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
 
create table t1 (id int,name string, age int);

insert into t1 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
---
 
Then executed the below query: 
--
select * from t3
where age not in (select distinct(age) age from t1);
--
 
The result should be as below:
{code:java}
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (35.897 seconds) {code}
But when we run the above query it is giving zero records:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t1);
INFO  : Compiling 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
select * from t3
where age not in (select distinct(age) age from t1)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
.
INFO  : Completed executing 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); Time 
taken: 10.191 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
++--+-+
No rows selected (12.17 seconds) {code}
The query works fine when we use nvl function or not null condition. 
 
So as a workaround we can use nvl function for both main and sub query as below:
{code:java}
select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
from t1); {code}
 
*SECOND ISSUE:* 

Also while testing multiple scenario's i found one more issue as well.

When the sub query table (t1) doesn't contain any null values then the query is 
giving result but it is ignoring the null values of the main table(t3) .
 
For example: Created another table t4 and inserted the data's without any null 
values:

create table t4 (id int,name string, age int);
insert into t4 
values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);

Now i tested with the below query and it gives 5 records. The count should be 
six and it omitted the null value of the table t3:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
..
INFO  : Completed executing 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); Time 
taken: 17.724 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (19.674 seconds) {code}
But disabling CBO fixed this issue:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> set hive.cbo.enable=false;
No rows affected (0.02 seconds)
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427165029_6c3167b0-c706-4230-a32f-2e4bc528be8f): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-05-06 Thread Shobika Selvaraj (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shobika Selvaraj updated HIVE-27324:

Description: 
Hive query giving empty results when the sub query table contains the null 
value. 

We encountered two issues here. 

1) The query - "select * from t3 where age not in (select distinct(age) age 
from t1);" is giving empty results when the table t1 contains a null value. 
Disabling cbo didn't helped here.

2) Let's consider the table t3 has null value and table t1 doesn't have any 
null values. Now if we run the above query it is returning other data's but not 
the null value from t3. If we disable the cbo then it's giving result with null 
value. 

 

*REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*

*FIRST ISSUE:*
Create two tables and insert data with null values as below:
---
create table t3 (id int,name string, age int);

 
insert into t3 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
 
create table t1 (id int,name string, age int);

insert into t1 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
---
 
Then executed the below query: 
--
select * from t3
where age not in (select distinct(age) age from t1);
--
 
The result should be as below:
{code:java}
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (35.897 seconds) {code}
But when we run the above query it is giving zero records:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t1);
INFO  : Compiling 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
select * from t3
where age not in (select distinct(age) age from t1)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
.
INFO  : Completed executing 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); Time 
taken: 10.191 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
++--+-+
No rows selected (12.17 seconds) {code}
The query works fine when we use nvl function or not null condition. 
 
So as a workaround we can use nvl function for both main and sub query as below:
--
*select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
from t1);*
--
 
*SECOND ISSUE:* 

Also while testing multiple scenario's i found one more issue as well.

When the sub query table (t1) doesn't contain any null values then the query is 
giving result but it is ignoring the null values of the main table(t3) .
 
For example: Created another table t4 and inserted the data's without any null 
values:

create table t4 (id int,name string, age int);
insert into t4 
values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);

Now i tested with the below query and it gives 5 records. The count should be 
six and it omitted the null value of the table t3:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
..
INFO  : Completed executing 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); Time 
taken: 17.724 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (19.674 seconds) {code}
But disabling CBO fixed this issue:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> set hive.cbo.enable=false;
No rows affected (0.02 seconds)
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427165029_6c3167b0-c706-4230-a32f-2e4bc528be8f): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join 

[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-05-06 Thread Shobika Selvaraj (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shobika Selvaraj updated HIVE-27324:

Description: 
Hive query giving empty results when the sub query table contains the null 
value. 

We encountered two issues here. 

1) The query - "select * from t3 where age not in (select distinct(age) age 
from t1);" is giving empty results when the table t1 contains a null value. 
Disabling cbo didn't helped here.

2) Let's consider the table t3 has null value and table t1 doesn't have any 
null values. Now if we run the above query it is returning other data's but not 
the null value from t3. If we disable the cbo then it's giving result with null 
value. 

 

*REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*

*FIRST ISSUE:*
Create two tables and insert data with null values as below:
---
create table t3 (id int,name string, age int);

 
insert into t3 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
 
create table t1 (id int,name string, age int);

insert into t1 
values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
---
 
Then executed the below query: 
--
select * from t3
where age not in (select distinct(age) age from t1);
--
 
The result should be as below:
{code:java}
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (35.897 seconds) {code}
But when we run the above query it is giving zero records:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t1);
INFO  : Compiling 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
select * from t3
where age not in (select distinct(age) age from t1)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
.
INFO  : Completed executing 
command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); Time 
taken: 10.191 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
++--+-+
No rows selected (12.17 seconds) {code}
The query works fine when we use nvl function or not null condition. 
 
So as a workaround we can use nvl function for both main and sub query as below:
--
*select * from t3 where nvl(age,'{-}') not in (select distinct(nvl(age,'{-}')) 
age from t1);*
--
 
*SECOND ISSUE:* 

Also while testing multiple scenario's i found one more issue as well.

When the sub query table (t1) doesn't contain any null values then the query is 
giving result but it is ignoring the null values of the main table(t3) .
 
For example: Created another table t4 and inserted the data's without any null 
values:

create table t4 (id int,name string, age int);
insert into t4 
values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);


Now i tested with the below query and it gives 5 records. The count should be 
six and it omitted the null value of the table t3:
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
product
..
..
INFO  : Completed executing 
command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); Time 
taken: 17.724 seconds
INFO  : OK
++--+-+
| t3.id  | t3.name  | t3.age  |
++--+-+
| 9      | ron      | 3       |
| 10     | Sam      | 22      |
| 11     | nick     | 19      |
| 12     | fed      | 18      |
| 13     | kong     | 13      |
++--+-+
5 rows selected (19.674 seconds) {code}
But disabling CBO fixed this issue:
**
{code:java}
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> set hive.cbo.enable=false;
No rows affected (0.02 seconds)
0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
. . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
distinct(age) age from t4);
INFO  : Compiling 
command(queryId=hive_20230427165029_6c3167b0-c706-4230-a32f-2e4bc528be8f): 
select * from t3
where age not in (select distinct(age) age from t4)
INFO  : Warning: Map Join