Aditya Shah created HIVE-22636: ---------------------------------- Summary: Data loss on skewjoin for ACID tables. Key: HIVE-22636 URL: https://issues.apache.org/jira/browse/HIVE-22636 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Aditya Shah
I am trying to do a skewjoin and writing the result into a FullAcid table. The results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 where the fix was to skip having a skewjoin for MM table. Steps to reproduce: Used a qtest similar to HIVE-16051: {code:java} --! qt:dataset:src1 --! qt:dataset:src -- MASK_LINEAGE set hive.mapred.mode=nonstrict; set hive.exec.dynamic.partition.mode=nonstrict; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.optimize.skewjoin=true; set hive.skewjoin.key=2; set hive.optimize.metadataonly=false; CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties ("transactional"="true"); FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE skewjoin_acid SELECT src1.key, src2.value; select count(distinct key) from skewjoin_acid; drop table skewjoin_acid; {code} The expected result for the count was 309 but got 173. -- This message was sent by Atlassian Jira (v8.3.4#803005)