Krisztian Kasa created HIVE-26452:
-------------------------------------
Summary: NPE when converting join to mapjoin and join column
referenced more than once
Key: HIVE-26452
URL: https://issues.apache.org/jira/browse/HIVE-26452
Project: Hive
Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
{code}
explain
select count(*)
from LU_CUSTOMER pa11
join ORDER_FACT a15
on (pa11.CUSTOMER_ID = a15.CUSTOMER_ID)
join LU_CUSTOMER a16
on (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID =
a16.CUSTOMER_ID);
{code}
{{a16.CUSTOMER_ID}} is referenced more than once in the join condition.
Hive generates Reduce sink operators for the join's children and one of the RS
row schema contains only one instance of the join keys (customer_id).
{code}
RS[13]
result = {HashMap@16092} size = 2
"KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]"
"KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]"
result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)"
signature = {ArrayList@16110} size = 1
0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int"
{code}
{{KEY.reducesinkkey1}} is missing from the schema.
When converting the join to mapjoin the converter algorithm fails looking up
both join key column instances.
https://github.com/apache/hive/blob/2aaba3c79e740ef27fc263b5a8ff33ad679c5a12/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java#L538
--
This message was sent by Atlassian Jira
(v8.20.10#820010)