Aditya Shah created HIVE-22636:
----------------------------------

             Summary: Data loss on skewjoin for ACID tables.
                 Key: HIVE-22636
                 URL: https://issues.apache.org/jira/browse/HIVE-22636
             Project: Hive
          Issue Type: Bug
    Affects Versions: 4.0.0
            Reporter: Aditya Shah


I am trying to do a skewjoin and writing the result into a FullAcid table. The 
results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 
where the fix was to skip having a skewjoin for MM table. 

Steps to reproduce:

Used a qtest similar to HIVE-16051:
{code:java}
--! qt:dataset:src1
--! qt:dataset:src

-- MASK_LINEAGE
set hive.mapred.mode=nonstrict;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=2;
set hive.optimize.metadataonly=false;

CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
("transactional"="true");
FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
skewjoin_acid SELECT src1.key, src2.value;
select count(distinct key) from skewjoin_acid;
drop table skewjoin_acid;
{code}
The expected result for the count was 309 but got 173. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to