subject:"\[jira\] \[Commented\] \(HIVE\-22561\) Data loss on map join for bucketed, partitioned table"

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2021-08-27 Thread Brahma Reddy Battula (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405970#comment-17405970
 ] 

Brahma Reddy Battula commented on HIVE-22561:
-

Looks duplicate of HIVE-22098?

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.1.0, 3.0.0
>
> Attachments: HIVE-22561.1.branch-3.1.patch, 
> HIVE-22561.branch-3.1.patch, HIVE-22561.patch, Screenshot 2019-11-28 at 
> 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-15 Thread Jesus Camacho Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996952#comment-16996952
 ] 

Jesus Camacho Rodriguez commented on HIVE-22561:


[~aditya-shah], I am not sure why it was not triggered... Nevertheless, the 
patch does not apply cleanly on branch-3.1.

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, 
> Screenshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-12 Thread Aditya Shah (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994365#comment-16994365
 ] 

Aditya Shah commented on HIVE-22561:


[~jcamachorodriguez] it seems to me that the profile for branch-3.1 does not 
run even if I submit the patch with that name. Can you please check once and 
let me know if I'm missing something here?

Thanks,
Aditya

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.branch-3.1.patch, HIVE-22561.patch, 
> Screenshot 2019-11-28 at 8.45.17 PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Jesus Camacho Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992113#comment-16992113
 ] 

Jesus Camacho Rodriguez commented on HIVE-22561:


[~aditya-shah], can you rebase the patch branch-3 and branch-3.1? It does not 
apply cleanly. Thanks

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-22561.patch, Screenshot 2019-11-28 at 8.45.17 
> PM.png, image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-12-09 Thread Hive QA (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991697#comment-16991697
 ] 

Hive QA commented on HIVE-22561:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12988312/HIVE-22561.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/19833/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19833/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19833/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-12-09 15:20:31.922
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-19833/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-12-09 15:20:31.925
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at d7a193b HIVE-22598: Fix 
TestCompactor.testDisableCompactionDuringReplLoad flakyness (Peter Vary 
reviewed by Zoltan Haindrich)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at d7a193b HIVE-22598: Fix 
TestCompactor.testDisableCompactionDuringReplLoad flakyness (Peter Vary 
reviewed by Zoltan Haindrich)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-12-09 15:20:33.143
+ rm -rf ../yetus_PreCommit-HIVE-Build-19833
+ mkdir ../yetus_PreCommit-HIVE-Build-19833
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-19833
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-19833/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java:
 does not exist in index
error: a/ql/src/test/queries/clientpositive/bucket_map_join_tez2.q: does not 
exist in index
error: a/ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/llap/limit_pushdown.q.out: does not 
exist in index
error: 
a/ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/llap/tez_smb_main.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/spark/bucket_map_join_tez2.q.out: 
does not exist in index
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java:110
Falling back to three-way merge...
Applied patch to 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java'
 cleanly.
error: patch failed: 
ql/src/test/queries/clientpositive/bucket_map_join_tez2.q:138
Falling back to three-way merge...
Applied patch to 'ql/src/test/queries/clientpositive/bucket_map_join_tez2.q' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/llap/limit_pushdown.q.out:923
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/llap/limit_pushdown.q.out' 
with conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out:1317
Falling back to three-way merge...
Applied patch to 
'ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out' with 
conflicts.
error: patch failed: 
ql/src/test/results/clientpositive/llap/tez_smb_main.q.out:592
Falling back to three-way merge...
Applied patch to 'ql/src/test/results/clientpositive/llap/tez_smb_main.q.out'

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

2019-11-28 Thread Aditya Shah (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984499#comment-16984499
 ] 

Aditya Shah commented on HIVE-22561:


[~djaiswal] [~prasanth_j] [~jcamachorodriguez] Can you please take a look at 
this. I tried debugging a bit. Some of the observations I made where:
 # The mapjoin operator does not populate the hashtable (hybrid as well as 
normal) completely for each task.
 # The results vary with the number of buckets. 

Is the hashtable distributed in someway according to buckets?

> Data loss on map join for bucketed, partitioned table
> -
>
> Key: HIVE-22561
> URL: https://issues.apache.org/jira/browse/HIVE-22561
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Aditya Shah
>Priority: Blocker
> Attachments: Screenshot 2019-11-28 at 8.45.17 PM.png, 
> image-2019-11-28-20-46-25-432.png
>
>
> A map join on a column (which is neither involved in bucketing and partition) 
> causes data loss. 
> Steps to reproduce:
> Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2.
> Create tables:
>  
> {code:java}
> CREATE TABLE `testj2`(
>   `id` int, 
>   `bn` string, 
>   `cn` string, 
>   `ad` map, 
>   `mi` array)
> PARTITIONED BY ( 
>   `br` string)
> CLUSTERED BY ( 
>   bn) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> CREATE TABLE `testj1`(
>   `id` int, 
>   `can` string, 
>   `cn` string, 
>   `ad` map, 
>   `av` boolean, 
>   `mi` array)
> PARTITIONED BY ( 
>   `brand` string)
> CLUSTERED BY ( 
>   can) 
> INTO 2 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES (
>   'bucketing_version'='2');
> {code}
> insert some data in both:
> {code:java}
> insert into testj1 values (100, 'mes_1', 'customer_1',  map('city1', 560077), 
> false, array(5, 10), 'brand_1'),
> (101, 'mes_2', 'customer_2',  map('city2', 560078), true, array(10, 20), 
> 'brand_2'),
> (102, 'mes_3', 'customer_3',  map('city3', 560079), false, array(15, 30), 
> 'brand_3'),
> (103, 'mes_4', 'customer_4',  map('city4', 560080), true, array(20, 40), 
> 'brand_4'),
> (104, 'mes_5', 'customer_5',  map('city5', 560081), false, array(25, 50), 
> 'brand_5');
> insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', 
> 560076),array(0, 0, 0), 'tv'),
> (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'),
> (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'),
> (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'),
> (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv');
> {code}
> Do a join between them:
> {code:java}
> select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 
> on (t1.id = t2.id) order by t1.id;
> {code}
> Observed results:
> !image-2019-11-28-20-46-25-432.png|width=524,height=100!
> In the plan, I can see a map join. Disabling it gives the correct result.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

[jira] [Commented] (HIVE-22561) Data loss on map join for bucketed, partitioned table

6 matches

Site Navigation

Mail list logo

Footer information