[jira] [Comment Edited] (HIVE-22561) Data loss on map join for bucketed, partitioned table
[ https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992113#comment-16992113 ] Jesus Camacho Rodriguez edited comment on HIVE-22561 at 12/10/19 2:08 AM: -- [~aditya-shah], can you rebase the patch for branch-3 and branch-3.1? It does not apply cleanly. Thanks was (Author: jcamachorodriguez): [~aditya-shah], can you rebase the patch branch-3 and branch-3.1? It does not apply cleanly. Thanks > Data loss on map join for bucketed, partitioned table > - > > Key: HIVE-22561 > URL: https://issues.apache.org/jira/browse/HIVE-22561 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: Aditya Shah >Assignee: Aditya Shah >Priority: Blocker > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-22561.patch, Screenshot 2019-11-28 at 8.45.17 > PM.png, image-2019-11-28-20-46-25-432.png > > > A map join on a column (which is neither involved in bucketing and partition) > causes data loss. > Steps to reproduce: > Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2. > Create tables: > > {code:java} > CREATE TABLE `testj2`( > `id` int, > `bn` string, > `cn` string, > `ad` map, > `mi` array) > PARTITIONED BY ( > `br` string) > CLUSTERED BY ( > bn) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > CREATE TABLE `testj1`( > `id` int, > `can` string, > `cn` string, > `ad` map, > `av` boolean, > `mi` array) > PARTITIONED BY ( > `brand` string) > CLUSTERED BY ( > can) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > {code} > insert some data in both: > {code:java} > insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 560077), > false, array(5, 10), 'brand_1'), > (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), > 'brand_2'), > (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30), > 'brand_3'), > (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), > 'brand_4'), > (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50), > 'brand_5'); > insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', > 560076),array(0, 0, 0), 'tv'), > (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'), > (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'), > (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'), > (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv'); > {code} > Do a join between them: > {code:java} > select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 > on (t1.id = t2.id) order by t1.id; > {code} > Observed results: > !image-2019-11-28-20-46-25-432.png|width=524,height=100! > In the plan, I can see a map join. Disabling it gives the correct result. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22561) Data loss on map join for bucketed, partitioned table
[ https://issues.apache.org/jira/browse/HIVE-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992113#comment-16992113 ] Jesus Camacho Rodriguez edited comment on HIVE-22561 at 12/10/19 2:07 AM: -- [~aditya-shah], can you rebase the patch branch-3 and branch-3.1? It does not apply cleanly. Thanks was (Author: jcamachorodriguez): [~aditya-shah], can you rebase the patch branch-3 and branch-3.1? It does not apply cleanly. Thanks > Data loss on map join for bucketed, partitioned table > - > > Key: HIVE-22561 > URL: https://issues.apache.org/jira/browse/HIVE-22561 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: Aditya Shah >Assignee: Aditya Shah >Priority: Blocker > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-22561.patch, Screenshot 2019-11-28 at 8.45.17 > PM.png, image-2019-11-28-20-46-25-432.png > > > A map join on a column (which is neither involved in bucketing and partition) > causes data loss. > Steps to reproduce: > Env: [hive-dev-box|[https://github.com/kgyrtkirk/hive-dev-box]] hive 3.1.2. > Create tables: > > {code:java} > CREATE TABLE `testj2`( > `id` int, > `bn` string, > `cn` string, > `ad` map, > `mi` array) > PARTITIONED BY ( > `br` string) > CLUSTERED BY ( > bn) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > CREATE TABLE `testj1`( > `id` int, > `can` string, > `cn` string, > `ad` map, > `av` boolean, > `mi` array) > PARTITIONED BY ( > `brand` string) > CLUSTERED BY ( > can) > INTO 2 BUCKETS > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > STORED AS TEXTFILE > TBLPROPERTIES ( > 'bucketing_version'='2'); > {code} > insert some data in both: > {code:java} > insert into testj1 values (100, 'mes_1', 'customer_1', map('city1', 560077), > false, array(5, 10), 'brand_1'), > (101, 'mes_2', 'customer_2', map('city2', 560078), true, array(10, 20), > 'brand_2'), > (102, 'mes_3', 'customer_3', map('city3', 560079), false, array(15, 30), > 'brand_3'), > (103, 'mes_4', 'customer_4', map('city4', 560080), true, array(20, 40), > 'brand_4'), > (104, 'mes_5', 'customer_5', map('city5', 560081), false, array(25, 50), > 'brand_5'); > insert into table testj2 values (100, 'tv_0', 'customer_0', map('city0', > 560076),array(0, 0, 0), 'tv'), > (101, 'tv_1', 'customer_1', map('city1', 560077),array(20, 25, 30), 'tv'), > (102, 'tv_2', 'customer_2', map('city2', 560078),array(40, 50, 60), 'tv'), > (103, 'tv_3', 'customer_3', map('city3', 560079),array(60, 75, 90), 'tv'), > (104, 'tv_4', 'customer_4', map('city4', 560080),array(80, 100, 120), 'tv'); > {code} > Do a join between them: > {code:java} > select t1.id, t1.can, t1.cn, t2.bn,t2.ad, t2.br FROM testj1 t1 JOIN testj2 t2 > on (t1.id = t2.id) order by t1.id; > {code} > Observed results: > !image-2019-11-28-20-46-25-432.png|width=524,height=100! > In the plan, I can see a map join. Disabling it gives the correct result. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)