[ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42753:
------------------------------------

    Assignee:     (was: Apache Spark)

> ReusedExchange refers to non-existent node
> ------------------------------------------
>
>                 Key: SPARK-42753
>                 URL: https://issues.apache.org/jira/browse/SPARK-42753
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Web UI
>    Affects Versions: 3.4.0
>            Reporter: Steven Chen
>            Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage ....
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to