[jira] [Assigned] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-10841: - Assignee: Laljo John Pullokkaran (was: Alexander Pivovarov) [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.1.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
[jira] [Assigned] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-10841: -- Assignee: Alexander Pivovarov [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col6 (type: int) 1 id