[ https://issues.apache.org/jira/browse/SPARK-51884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-51884. --------------------------------- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50838 [https://github.com/apache/spark/pull/50838] > Subquery Definition Changes For Adding Support For Nested Correlated > Subqueries > ------------------------------------------------------------------------------- > > Key: SPARK-51884 > URL: https://issues.apache.org/jira/browse/SPARK-51884 > Project: Spark > Issue Type: Sub-task > Components: Optimizer, SQL > Affects Versions: 4.1.0 > Reporter: Avery Qi > Assignee: Avery Qi > Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Newly added argument for SubqueryExpression: OuterScopeAttrs. > `OuterScopeAttrs` contains attributes cannot be resolved in the containing > query of the subquery or the subquery itself, but can be resolved in the > whole query plan. > h2. The task design includes: > * Add `OuterScopeAttrs` and related getter and setter methods for > SubqueryExpression > * All attributes in `OuterScopeAttrs` must be contained in the `OuterAttrs` > AttributeSet of SubqueryExpression > * Update the usage of SubqueryExpression and classes extending > SubqueryExpression > * Update SubqueryExpression.references to be `AttributeSet(outerAttrs) – > AttributeSet(outerScopeAttrs)` > h2. Why we need the change? > Spark only supports one layer of correlation now and does not support nested > correlation. > For example, > SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 > FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == MAX(t1.col2) > )GROUP BY col1; > > is supported and > SELECT col1 FROM VALUES (1, 2) t1 (col1, col2) WHERE EXISTS ( SELECT col1 > FROM VALUES (1, 2) t2 (col1, col2) WHERE t2.col2 == ( SELECT MAX(t1.col2) > ) > )GROUP BY col1; > > is not supported. > The reason spark does not support it is because the Analyzer and Optimizer > resolves and plans Subquery in a recursive way. > The definition change for the SubqueryExpression adds the metadata > OuterScopeAttrs which helps later rewrites for the Analyzer and Optimizer. > h2. High Level Design > [https://docs.google.com/document/d/1EGB48ArLQ04OZvb-zx_VVTJ8roIoCPwSRw4vTVuDY7o/edit?usp=sharing] > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org