Naresh P R created HIVE-28213:
---------------------------------

             Summary: Incorrect results after insert-select from similar 
bucketed source & target table
                 Key: HIVE-28213
                 URL: https://issues.apache.org/jira/browse/HIVE-28213
             Project: Hive
          Issue Type: Bug
            Reporter: Naresh P R
         Attachments: test.q

Insert-select is not honoring bucketing if both source & target are bucketed on 
same column.

eg., 
{code:java}
CREATE EXTERNAL TABLE bucketing_table1 (id INT)
CLUSTERED BY (id)
SORTED BY (id ASC)
INTO 32 BUCKETS stored as textfile;

INSERT INTO TABLE bucketing_table1 VALUES (1), (2), (3), (4), (5);

CREATE EXTERNAL TABLE bucketing_table2 like bucketing_table1;

INSERT INTO TABLE bucketing_table2 select * from bucketing_table1;{code}
id=1 => murmur_hash(1) %32 should go to 29th bucket file.

bucketing_table1 has id=1 at 29th file,

but bucketing_table2 doesn't have 29th file because Insert-select dint honor 
the bucketing.
{code:java}
SELECT count(*) FROM bucketing_table1 WHERE id = 1;
===
1 //correct result

SELECT count(*) FROM bucketing_table2 WHERE id = 1;  
===
0 // incorrect result{code}
Workaround: hive.tez.bucket.pruning=false;

PS: Attaching repro file [^test.q]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to