Eren Avsarogullari created SPARK-55052:
------------------------------------------

             Summary: Add AQEShuffleRead properties to Physical Plan Tree
                 Key: SPARK-55052
                 URL: https://issues.apache.org/jira/browse/SPARK-55052
             Project: Spark
          Issue Type: Task
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Eren Avsarogullari


AQEShuffleRead can have *local / coalesced / skewed* / *coalesced and skewed* 
properties when reading shuffle files. When Physical Plan Tree is complex, it 
is hard to track this info by correlating with AQEShuffleRead details such as 
which AQEShuffleRead has local read or skewed partition info etc. For example, 
following skew SMJ case, this helps to understand which SMJ leg has 
AQEShuffleRead with skew. This addition aims to help this kind of use-cases at 
physical plan tree level. Plan Tree details section per AQEShuffleRead node 
also shows these properties but when query plan tree is too complex (e.g: 
composed by 1000+ physical nodes), it is hard to correlate this information 
with AQEShuffleRead details.

*Current Physical Plan Tree:*
{code:java}
== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
   ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
   +- * Project (16)
      +- * SortMergeJoin(skew=true) Inner (15)
         :- * Sort (7)
         :  +- AQEShuffleRead (6)
         :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, 
rowCount=1.00E+3)
         :        +- Exchange (4)
         :           +- * Project (3)
         :              +- * Filter (2)
         :                 +- * Range (1)
         +- * Sort (14)
            +- AQEShuffleRead (13)
               +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, 
rowCount=200)
                  +- Exchange (11)
                     +- * Project (10)
                        +- * Filter (9)
                           +- * Range (8){code}
*New Physical Plan Tree:*
{code:java}
== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
   ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
   +- * Project (16)
      +- * SortMergeJoin(skew=true) Inner (15)
         :- * Sort (7)
         :  +- AQEShuffleRead (6), coalesced
         :     +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB, 
rowCount=1.00E+3)
         :        +- Exchange (4)
         :           +- * Project (3)
         :              +- * Filter (2)
         :                 +- * Range (1)
         +- * Sort (14)
            +- AQEShuffleRead (13), coalesced and skewed
               +- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB, 
rowCount=200)
                  +- Exchange (11)
                     +- * Project (10)
                        +- * Filter (9)
                           +- * Range (8){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to