[ https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-42753: ------------------------------------ Assignee: Apache Spark > ReusedExchange refers to non-existent node > ------------------------------------------ > > Key: SPARK-42753 > URL: https://issues.apache.org/jira/browse/SPARK-42753 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI > Affects Versions: 3.4.0 > Reporter: Steven Chen > Assignee: Apache Spark > Priority: Major > > There is an AQE “issue“ where during AQE planning, the Exchange "that's > being" reused could be replaced in the plan tree. So, when we print the query > plan, the ReusedExchange will refer to an “unknown“ Exchange. An example > below: > > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown] > Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code} > > > Below is an example to demonstrate the root cause: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A > |-- SomeNode Y > |-- Exchange B > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C > |-- SomeNode N > |-- Exchange D > {code} > > > Step 1: Exchange B is materialized and the QueryStage is added to stage cache > Step 2: Exchange D reuses Exchange B > Step 3: Exchange C is materialized and the QueryStage is added to stage cache > Step 4: Exchange A reuses Exchange C > > Then the final plan looks like: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A -> ReusedExchange (reuses Exchange C) > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C -> PhotonShuffleMapStage .... > |-- SomeNode N > |-- Exchange D -> ReusedExchange (reuses Exchange B) > {code} > > > As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist > node. This *DOES NOT* affect query execution but will cause the query > visualization malfunction in the following ways: > # The ReusedExchange child subtree will still appear in the Spark UI graph > but will contain no node IDs. > # The ReusedExchange node details in the Explain plan will refer to a > UNKNOWN node. Example below. > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown]{code} > # The child exchange and its subtree may be missing from the Explain text > completely. No node details or tree string shown. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org