Thomas Rebele created HIVE-29164:
------------------------------------
Summary: AntiJoin rule fails to apply with an extra constant in
the SELECT clause
Key: HIVE-29164
URL: https://issues.apache.org/jira/browse/HIVE-29164
Project: Hive
Issue Type: Bug
Reporter: Thomas Rebele
Assignee: Thomas Rebele
Adding a constant to the SELECT clause prevents Hive from applying the
HiveAntiSemiJoinRule.
Example:
{code:java}
CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);
explain cbo select t1.fieldA
from tab t1
left join tab t2
on t2.fieldA=t1.fieldA
where t2.fieldA is null;
{code}
results in a plan
{code:java}
HiveAntiJoin(condition=[=($1, $0)], joinType=[anti])
HiveProject(fielda=[$0])
HiveTableScan(table=[[default, tab]], table:alias=[t1])
HiveProject(fielda=[$0])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, tab]], table:alias=[t2])
{code}
However, including a constant column "abc" in the result
{code:java}
CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);
explain cbo select t1.fieldA, "abc"
from tab t1
left join tab t2
on t2.fieldA=t1.fieldA
where t2.fieldA is null;
{code}
leads to a plan without an anti-join:
{code:java}
HiveProject(fielda=[$0], _o__c1=[_UTF-16LE'abc':VARCHAR(2147483647) CHARACTER
SET "UTF-16LE"])
HiveFilter(condition=[IS NULL($1)])
HiveJoin(condition=[=($1, $0)], joinType=[left], algorithm=[none],
cost=[not available])
HiveProject(fielda=[$0])
HiveTableScan(table=[[default, tab]], table:alias=[t1])
HiveProject(fielda=[$0])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, tab]], table:alias=[t2])
{code}
The cause is a bug in
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil#hasAnyExpressionFromRightSide.
The condition {{rightBitmap.contains(inputBits)}} returns true if no field
from the RHS of the join is used at all.
Thanks to [~Dayakar] and [~krisztiankasa] for finding and discussing the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)