[
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849421#comment-17849421
]
Sungwoo Park commented on HIVE-26018:
-------------------------------------
Currently uniquejoin.q passes because it uses MapReduce execution engine. If
Tez execution engine is used, uniquejoin.q fails for the same reason described
in this JIRA.
The difference in the outcome is due to different representations of empty rows
in MapReduce and Tez. If there is no row for the given key,
1. MapReduce's JoinOperator: the storage is empty
2. Tez's MapJoinOperator/CommonMergeJoinOperator: the storage contains an dummy
row.
Does anyone still use UNIQUEJOIN in production? This is a correctness issue, so
we would like to investigate further if UNIQUEJOIN is still used.
cc. [~seonggon]
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> -----------------------------------------------------------------------
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
> Issue Type: Bug
> Components: Tez
> Affects Versions: 3.1.0, 4.0.0
> Reporter: GuangMing Lu
> Priority: Major
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key |b.key |
> |aaa |aaa |
> |bbb |NULL |
> |ccc |ccc |
> |NULL |ddd |
> +------------------+
> Hive on MR result: right
> |a.key |b.key |
> |aaa |aaa |
> |bbb |NULL |
> |ccc |ccc |
> +-----------------+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +-------------------+
> |a.key |b.key |
> |aaa |aaa |
> |bbb |NULL |
> |ccc |ccc |
> |NULL |ddd |
> +-----------------+
> Hive on MR result: right
> |a.key |b.key |
> |aaa |aaa |
> |ccc |ccc |
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)