Aditya Shah created HIVE-22636:
----------------------------------
Summary: Data loss on skewjoin for ACID tables.
Key: HIVE-22636
URL: https://issues.apache.org/jira/browse/HIVE-22636
Project: Hive
Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Aditya Shah
I am trying to do a skewjoin and writing the result into a FullAcid table. The
results are incorrect. The issue is similar to seen for MM tables in HIVE-16051
where the fix was to skip having a skewjoin for MM table.
Steps to reproduce:
Used a qtest similar to HIVE-16051:
{code:java}
--! qt:dataset:src1
--! qt:dataset:src
-- MASK_LINEAGE
set hive.mapred.mode=nonstrict;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=2;
set hive.optimize.metadataonly=false;
CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties
("transactional"="true");
FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE
skewjoin_acid SELECT src1.key, src2.value;
select count(distinct key) from skewjoin_acid;
drop table skewjoin_acid;
{code}
The expected result for the count was 309 but got 173.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)