[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-07-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23438:
--
Labels: pull-request-available  (was: )

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23438.001.branch-2.3.patch, 
> HIVE-23438.branch-2.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: HIVE-23438.001.branch-2.3.patch

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.001.branch-2.3.patch, 
> HIVE-23438.branch-2.3.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Status: In Progress  (was: Patch Available)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.branch-2.3.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: HIVE-23438.branch-2.3.patch
Status: Patch Available  (was: Open)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.branch-2.3.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Status: Open  (was: Patch Available)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: (was: HIVE-23438.branch-2.3.patch)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: HIVE-23438.branch-2.3.patch
Status: Patch Available  (was: Open)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.branch-2.3.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: (was: HIVE-23438.patch)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Status: Open  (was: Patch Available)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: HIVE-23438.patch
Status: Patch Available  (was: Open)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: (was: HIVE-23438.patch)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: HIVE-23438.patch

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Attachment: (was: HIVE-23438.patch)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *Run Test in Patch File*
> {code:java}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
> *Manual Reproduce*
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true; 
> set hive.auto.convert.join.noconditionaltask=true; 
> set hive.auto.convert.join.noconditionaltask.size=1000; 
> set hive.cbo.enable=false;
> set hive.mapjoin.hybridgrace.hashtable=true;
> select *
> from
> (
> select key from src1 group by key
> ) x
> left join src2 z on x.key = z.key
> join
> (
> select key from srcpart y group by key
> ) y on y.key = x.key;
> {code}
> *EXPECTED RESULT***
>  
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 238   NULLNULL238
> 369   NULLNULL369
> 406   406 1val_4061   406
> 273   273 1val_2731   273
> 98NULLNULL98
> 213   213 1val_2131   213
> 255   NULLNULL255
> 401   401 1val_4011   401
> 278   NULLNULL278
> 6666  11val_6611  66
> 224   NULLNULL224
> 311   NULLNULL311
> {code}
>  
> *ACTUAL RESULT*
> {code:java}
> 128   NULLNULL128
> 146   146 1val_1461   146
> 150   150 1val_1501   150
> 213   213 1val_2131   213
> 238   NULLNULL238
> 273   273 1val_2731   273
> 369   NULLNULL369
> 406   406 1val_4061   406
> 98NULLNULL98
> 401   401 1val_4011   401
> 6666  11val_6611  66
> {code}
>  
> *ROOT CAUSE*
> src1 left join src2, src1 is big table and src2 is small table. Join result 
> between big table row and the corresponding hashtable maybe NO_MATCH state, 
> however, these NO_MATCH rows is needed because LEFT OUTER JOIN.
> In addition, these big table rows will not spilled into matchfile related to 
> this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
> Then, these big table rows will be spilled into matchfile in hashtables of 
> table `srcpart`(second small table)
> Finally, when reProcessBigTable, big table rows in matchfile are only read 
> from `firstSmallTable`, some datum are missing.
>  
> *WORKAROUND*
>  configure firstSmallTable in completeInitializationOp and only spill big 
> table row into firstSmallTable when spill matchfile.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Description: 
*Run Test in Patch File*
{code:java}
mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
*Manual Reproduce*

*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true; 
set hive.auto.convert.join.noconditionaltask=true; 
set hive.auto.convert.join.noconditionaltask.size=1000; 
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;

select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;
{code}
*EXPECTED RESULT***

 
{code:java}
128 NULLNULL128
146 146 1val_1461   146
150 150 1val_1501   150
238 NULLNULL238
369 NULLNULL369
406 406 1val_4061   406
273 273 1val_2731   273
98  NULLNULL98
213 213 1val_2131   213
255 NULLNULL255
401 401 1val_4011   401
278 NULLNULL278
66  66  11val_6611  66
224 NULLNULL224
311 NULLNULL311
{code}
 

*ACTUAL RESULT*
{code:java}
128 NULLNULL128
146 146 1val_1461   146
150 150 1val_1501   150
213 213 1val_2131   213
238 NULLNULL238
273 273 1val_2731   273
369 NULLNULL369
406 406 1val_4061   406
98  NULLNULL98
401 401 1val_4011   401
66  66  11val_6611  66
{code}
 

*ROOT CAUSE*

src1 left join src2, src1 is big table and src2 is small table. Join result 
between big table row and the corresponding hashtable maybe NO_MATCH state, 
however, these NO_MATCH rows is needed because LEFT OUTER JOIN.

In addition, these big table rows will not spilled into matchfile related to 
this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
Then, these big table rows will be spilled into matchfile in hashtables of 
table `srcpart`(second small table)

Finally, when reProcessBigTable, big table rows in matchfile are only read from 
`firstSmallTable`, some datum are missing.

 

*WORKAROUND*

 configure firstSmallTable in completeInitializationOp and only spill big table 
row into firstSmallTable when spill matchfile.

 

  was:
*Run Test in Patch File*
{code:java}
mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
*Manual Reproduce*

*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true; 
set hive.auto.convert.join.noconditionaltask=true; 
s

[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Description: 
*Run Test in Patch File*
{code:java}
mvn test -Dtest=TestMiniTezCliDriver -Dqfile=hybridgrace_hashjoin_2.q{code}
*Manual Reproduce*

*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true; 
set hive.auto.convert.join.noconditionaltask=true; 
set hive.auto.convert.join.noconditionaltask.size=1000; 
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;EXPLAIN
select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;
{code}
*EXPECTED RESULT***

 
{code:java}
128 NULLNULL128
146 146 1val_1461   146
150 150 1val_1501   150
238 NULLNULL238
369 NULLNULL369
406 406 1val_4061   406
273 273 1val_2731   273
98  NULLNULL98
213 213 1val_2131   213
255 NULLNULL255
401 401 1val_4011   401
278 NULLNULL278
66  66  11val_6611  66
224 NULLNULL224
311 NULLNULL311
{code}
 

*ACTUAL RESULT*
{code:java}
//代码占位符
{code}
*Root Cause*

src1 left join src2, src1 is big table and src2 is small table. Join result 
between big table row and the corresponding hashtable maybe NO_MATCH state, 
however, these NO_MATCH rows is needed because LEFT OUTER JOIN.

In addition, these big table rows will not spilled into matchfile related to 
this hashtable on disk because only SPILL state can use `spillBigTableRow`.  
Then, these big table rows will be spilled into matchfile in hashtables of 
table `srcpart`(second small table)

Finally, when reProcessBigTable, big table rows in matchfile are only read from 
`firstSmallTable`, some datum are missing.

 

*WorkAround*

 

 

  was:
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;

select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;
{code}


> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components:

[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Description: 
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;

select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;
{code}

  was:
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;

select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;

{code}


> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*

[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Description: 
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;
set hive.mapjoin.hybridgrace.hashtable=true;

select *
from
(
select key from src1 group by key
) x
left join src2 z on x.key = z.key
join
(
select key from srcpart y group by key
) y on y.key = x.key;

{code}

  was:
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;

{code}


> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1000;
> set hive.cbo.enable=false;
> set hi

[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Description: 
*STEP 1. Create test data(q_test_init_tez.sql)*
{code:java}
//create table src1
CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
'default') STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;

//create table src2
CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 'default') 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
TABLE src2;

//create table srcpart
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");

LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
*STEP 2. Run query*
{code:java}
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.cbo.enable=false;

{code}

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>
> *STEP 1. Create test data(q_test_init_tez.sql)*
> {code:java}
> //create table src1
> CREATE TABLE src1 (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv3.txt" INTO TABLE src1;
> //create table src2
> CREATE TABLE src2(key STRING COMMENT 'default', value STRING COMMENT 
> 'default') STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv11.txt" OVERWRITE INTO 
> TABLE src2;
> //create table srcpart
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default')
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-08", hr="12");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt"
> OVERWRITE INTO TABLE srcpart PARTITION (ds="2008-04-09", hr="12");{code}
> *STEP 2. Run query*
> {code:java}
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1000;
> set hive.cbo.enable=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23438) Missing Rows When Left Outer Join In N-way HybridGraceHashJoin

2020-05-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

范宜臻 updated HIVE-23438:
---
Summary: Missing Rows When Left Outer Join In N-way HybridGraceHashJoin  
(was: Missing Rows When Left Outer Join In HybridGraceHashJoin)

> Missing Rows When Left Outer Join In N-way HybridGraceHashJoin
> --
>
> Key: HIVE-23438
> URL: https://issues.apache.org/jira/browse/HIVE-23438
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Tez
>Affects Versions: 2.3.4
>Reporter: 范宜臻
>Priority: Major
> Attachments: HIVE-23438.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)