[ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy reopened HIVE-22636:
-----------------------------------

> Data loss on skewjoin for ACID tables.
> --------------------------------------
>
>                 Key: HIVE-22636
>                 URL: https://issues.apache.org/jira/browse/HIVE-22636
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Aditya Shah
>            Priority: Blocker
>              Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to