Steven Chen created SPARK-42753:
-----------------------------------

             Summary: ReusedExchange refers to non-existen node
                 Key: SPARK-42753
                 URL: https://issues.apache.org/jira/browse/SPARK-42753
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, Web UI
    Affects Versions: 3.4.0
            Reporter: Steven Chen


There is an AQE “issue“ where during AQE planning, the Exchange "that's being" 
reused could be replaced in the plan tree. So, when we print the query plan, 
the ReusedExchange will refer to an “unknown“ Exchange. An example 
below:{{{}{}}}
{code:java}

{code}
{{ (2775) ReusedExchange [Reuses operator id: unknown]
 Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]}}

{{ }}

 

Below is an example to demonstrate the root cause:

{{}}
{code:java}

{code}
{{AdaptiveSparkPlan
  |-- SomeNode X (subquery xxx)
      |-- Exchange A
          |-- SomeNode Y
              |-- Exchange B

Subquery:Hosting operator = SomeNode Hosting Expression = xxx dynamicpruning#388
AdaptiveSparkPlan
  |-- SomeNode M
      |-- Exchange C
          |-- SomeNode N
              |-- Exchange D}}

{{ }}

 

Step 1: Exchange B is materialized and the QueryStage is added to stage cache

Step 2: Exchange D reuses Exchange B

Step 3: Exchange C is materialized and the QueryStage is added to stage cache

Step 4: Exchange A reuses Exchange C

 

Then the final plan looks like:

{{}}
{code:java}

{code}
{{AdaptiveSparkPlan
  |-- SomeNode X (subquery xxx)
      |-- Exchange A -> ReusedExchange (reuses Exchange C)


Subquery:Hosting operator = SomeNode Hosting Expression = xxx dynamicpruning#388
AdaptiveSparkPlan
  |-- SomeNode M
      |-- Exchange C -> PhotonShuffleMapStage ....
          |-- SomeNode N
              |-- Exchange D -> ReusedExchange (reuses Exchange B)}}

{{ }}

 

As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
node. This *DOES NOT* affect query execution but will cause the query 
visualization malfunction in the following ways:
 # The ReusedExchange child subtree will still appear in the Spark UI graph but 
will contain no node IDs.
 # The ReusedExchange node details in the Explain plan will refer to a UNKNOWN 
node. Example below.

{code:java}
(2775) ReusedExchange [Reuses operator id: unknown]{code}

 # The child exchange and its subtree may be missing from the Explain text 
completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to