[ https://issues.apache.org/jira/browse/SPARK-49646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot reassigned SPARK-49646: -------------------------------------- Assignee: (was: Apache Spark) > fix subquery decorrelation for union / set operations when > parentOuterReferences has references not covered in > collectedChildOuterReferences > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-49646 > URL: https://issues.apache.org/jira/browse/SPARK-49646 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 4.0.0 > Reporter: Avery Qi > Priority: Major > Labels: pull-request-available > > spark currently cannot handle queries like: > ``` > create table IF NOT EXISTS t(t1 INT,t2 int) using json; > CREATE TABLE IF NOT EXISTS a (a1 INT) using json; > select 1 > from t as t_outer > left join > lateral( > select b1,b2 > from > ( > select > a.a1 as b1, > 1 as b2 > from a > union > select t_outer.t1 as b1, > null as b2 > ) as t_inner > where (t_inner.b1 < t_outer.t2 or t_inner.b1 is null) and t_inner.b1 > = t_outer.t1 > order by t_inner.b1,t_inner.b2 desc limit 1 > ) as lateral_table > ``` > And the stack error trace is: > org.apache.spark.SparkException: <Redacted Exception Message> at > org.apache.spark.SparkException$.internalError(SparkException.scala:97) at > org.apache.spark.SparkException$.internalError(SparkException.scala:101) at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) > at > org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:286) at > scala.collection.TraversableLike.map$(TraversableLike.scala:279) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) > at > org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) > at > org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) > ... > > {color:#172b4d}See this investigation doc for more context: {color} > [https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org