Eren Avsarogullari created SPARK-55052:
------------------------------------------
Summary: Add AQEShuffleRead properties to Physical Plan Tree
Key: SPARK-55052
URL: https://issues.apache.org/jira/browse/SPARK-55052
Project: Spark
Issue Type: Task
Components: SQL
Affects Versions: 4.2.0
Reporter: Eren Avsarogullari
AQEShuffleRead can have *local / coalesced / skewed* / *coalesced and skewed*
properties when reading shuffle files. When Physical Plan Tree is complex, it
is hard to track this info by correlating with AQEShuffleRead details such as
which AQEShuffleRead has local read or skewed partition info etc. For example,
following skew SMJ case, this helps to understand which SMJ leg has
AQEShuffleRead with skew. This addition aims to help this kind of use-cases at
physical plan tree level. Plan Tree details section per AQEShuffleRead node
also shows these properties but when query plan tree is too complex (e.g:
composed by 1000+ physical nodes), it is hard to correlate this information
with AQEShuffleRead details.
*Current Physical Plan Tree:*
{code:java}
== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
+- * Project (16)
+- * SortMergeJoin(skew=true) Inner (15)
:- * Sort (7)
: +- AQEShuffleRead (6)
: +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB,
rowCount=1.00E+3)
: +- Exchange (4)
: +- * Project (3)
: +- * Filter (2)
: +- * Range (1)
+- * Sort (14)
+- AQEShuffleRead (13)
+- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB,
rowCount=200)
+- Exchange (11)
+- * Project (10)
+- * Filter (9)
+- * Range (8){code}
*New Physical Plan Tree:*
{code:java}
== Physical Plan ==
AdaptiveSparkPlan (24)
+- == Final Plan ==
ResultQueryStage (17), Statistics(sizeInBytes=8.0 EiB)
+- * Project (16)
+- * SortMergeJoin(skew=true) Inner (15)
:- * Sort (7)
: +- AQEShuffleRead (6), coalesced
: +- ShuffleQueryStage (5), Statistics(sizeInBytes=15.6 KiB,
rowCount=1.00E+3)
: +- Exchange (4)
: +- * Project (3)
: +- * Filter (2)
: +- * Range (1)
+- * Sort (14)
+- AQEShuffleRead (13), coalesced and skewed
+- ShuffleQueryStage (12), Statistics(sizeInBytes=3.1 KiB,
rowCount=200)
+- Exchange (11)
+- * Project (10)
+- * Filter (9)
+- * Range (8){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]