Thomas Rebele created HIVE-29164:
------------------------------------

             Summary: AntiJoin rule fails to apply with an extra constant in 
the SELECT clause
                 Key: HIVE-29164
                 URL: https://issues.apache.org/jira/browse/HIVE-29164
             Project: Hive
          Issue Type: Bug
            Reporter: Thomas Rebele
            Assignee: Thomas Rebele


Adding a constant to the SELECT clause prevents Hive from applying the 
HiveAntiSemiJoinRule.

Example:
{code:java}
CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);

explain cbo select t1.fieldA
from tab t1
left join tab t2
on t2.fieldA=t1.fieldA
where t2.fieldA is null;
{code}
results in a plan
{code:java}
HiveAntiJoin(condition=[=($1, $0)], joinType=[anti])
  HiveProject(fielda=[$0])
    HiveTableScan(table=[[default, tab]], table:alias=[t1])
  HiveProject(fielda=[$0])
    HiveFilter(condition=[IS NOT NULL($0)])
      HiveTableScan(table=[[default, tab]], table:alias=[t2])
{code}
However, including a constant column "abc" in the result
{code:java}
CREATE EXTERNAL TABLE tab(fieldA string, fieldB string);

explain cbo select t1.fieldA, "abc"
from tab t1
left join tab t2
on t2.fieldA=t1.fieldA
where t2.fieldA is null;
{code}
leads to a plan without an anti-join:
{code:java}
HiveProject(fielda=[$0], _o__c1=[_UTF-16LE'abc':VARCHAR(2147483647) CHARACTER 
SET "UTF-16LE"])
  HiveFilter(condition=[IS NULL($1)])
    HiveJoin(condition=[=($1, $0)], joinType=[left], algorithm=[none], 
cost=[not available])
      HiveProject(fielda=[$0])
        HiveTableScan(table=[[default, tab]], table:alias=[t1])
      HiveProject(fielda=[$0])
        HiveFilter(condition=[IS NOT NULL($0)])
          HiveTableScan(table=[[default, tab]], table:alias=[t2])
{code}
The cause is a bug in 
org.apache.hadoop.hive.ql.optimizer.calcite.HiveCalciteUtil#hasAnyExpressionFromRightSide.
 The condition {{rightBitmap.contains(inputBits)}} returns true if no field 
from the RHS of the join is used at all.

Thanks to [~Dayakar] and [~krisztiankasa] for finding and discussing the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to