[ 
https://issues.apache.org/jira/browse/SPARK-49646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49646:
--------------------------------------

    Assignee:     (was: Apache Spark)

> fix subquery decorrelation for union / set operations when 
> parentOuterReferences has references not covered in 
> collectedChildOuterReferences
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-49646
>                 URL: https://issues.apache.org/jira/browse/SPARK-49646
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 4.0.0
>            Reporter: Avery Qi
>            Priority: Major
>              Labels: pull-request-available
>
> spark currently cannot handle queries like:
> ```
> create table IF NOT EXISTS t(t1 INT,t2 int) using json;
> CREATE TABLE IF NOT EXISTS a (a1 INT) using json;
> select 1
> from t as t_outer
> left join
>    lateral(
>        select b1,b2
>        from
>        (
>            select
>                a.a1 as b1,
>                1 as b2
>            from a
>            union
>            select t_outer.t1 as b1,
>                   null as b2
>        ) as t_inner
>        where (t_inner.b1 < t_outer.t2  or t_inner.b1 is null) and  t_inner.b1 
> = t_outer.t1
>        order by t_inner.b1,t_inner.b2 desc limit 1
>    ) as lateral_table
> ```
> And the stack error trace is:
> org.apache.spark.SparkException: <Redacted Exception Message>  at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:97)  at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:101)  at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)  
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)  at 
> scala.collection.TraversableLike.map(TraversableLike.scala:286)  at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:279)  at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744)  
> at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463)
>   at 
> org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)
> ...
>  
> {color:#172b4d}See this investigation doc for more context: {color}
> [https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to