[ 
https://issues.apache.org/jira/browse/HIVE-29688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092676#comment-18092676
 ] 

Stamatis Zampetakis commented on HIVE-29688:
--------------------------------------------

The problem triggers when there are *correlated semijoins (or antijoins)*. In 
Hive, a semijoin is generated when there is a WHERE clause with an IN or EXISTS 
subquery:
{code:sql}
SELECT x.p_partkey
FROM part x
WHERE x.p_brand IN (SELECT 'Brand#32');

SELECT x.p_partkey
FROM part x
WHERE x.p_brand IN (SELECT y.p_brand FROM part y);

SELECT x.p_partkey
FROM part x
WHERE EXISTS (SELECT 1 FROM part y);

SELECT x.p_partkey
FROM part x
WHERE EXISTS (SELECT 1 FROM part y WHERE x.p_brand = y.p_brand);
{code}

The above are common examples that showcase when a semijoin is introduced. 
These semijoins are not correlated so on their own they don’t trigger the 
problem.

A correlated semijoin occurs when these patterns are associated with an outer 
query block:
{code:sql}
SELECT x.p_partkey
FROM part x
WHERE EXISTS (SELECT 1
              FROM part y
              WHERE x.p_name = y.p_name
                AND y.p_brand IN (SELECT 'Brand#32'));
{code}
The {{x.p_name = y.p_name}} is a correlated condition that associates the outer 
{{x}} table with the {{y}} table forming the semijoin.

In other words, the WHERE clause must contain an IN/EXISTS predicate combined 
(AND) with a condition that references both the inner and outer table in order 
to trigger the problem.

> IndexOutOfBoundsException when WHERE clause contains IN/EXISTS subqueries AND 
> correlated conditions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29688
>                 URL: https://issues.apache.org/jira/browse/HIVE-29688
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Thomas Rebele
>            Assignee: Thomas Rebele
>            Priority: Major
>              Labels: pull-request-available
>
> The following q file test fails with an exception:
> {code:java}
> drop table if exists `table1`;
> CREATE EXTERNAL TABLE `table1`( 
>   `f1` string,                     
>   `f2` string,      
>   `f3` string,                     
>   `f4` string,          
>   `f5` string,  
>   `f6` string)                           
> ;
> SELECT 1
> FROM table1 a
> WHERE a.f4 IN ('1', '2')
>     AND EXISTS (
>         SELECT 1
>         FROM table1 b 
>         WHERE  a.f6 = b.f1 AND b.f3 IN (SELECT 1)
>     )
>     ;
> {code}
> Steps to reproduce:
> {code}
> mvn clean install -DskipTests  -Denforcer.skip=true -T 1C
> mvn test  -pl ql,itests/qtest,itests/test-serde,itests/util -Pitests 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=decorrelate-semi-join.q 
> -Dtest.output.overwrite=true -Denforcer.skip=true
> {code}
> The exception was:
> {code:java}
> java.lang.IndexOutOfBoundsException: Index 3 out of bounds for length 1
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
>     at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
>     at java.base/java.util.Objects.checkIndex(Objects.java:385)
>     at java.base/java.util.ArrayList.get(ArrayList.java:427)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitInputRef(ASTConverter.java:853)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitInputRef(ASTConverter.java:808)
>     at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:113)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:1107)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:808)
>     at org.apache.calcite.rex.RexCall.accept(RexCall.java:189)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:283)
>     at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:136)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:605)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13230)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:476)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:359)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:234)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:790)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:760)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:139)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to