Clarification on what "[id=#]" refers to in Physical Plan Exchange hashpartitioning

2024-04-04 Thread Tahj Anderson
Hello,

While looking through spark physical plans generated by the spark history 
server log to find any bottle necks in my code, I stumbled across an ID that 
shows up in a partitioning stage.
My goal is to use the history server log to provide meaningful analysis on my 
spark system performance. With this goal in mind, I am trying to connect spark 
physical plans to StageIDs which house useful information that I can tie back 
to my code. Below is a snippet from one of the physical plans.
+- *(2) Sort [Column#46 ASC NULLS FIRST], true, 0
+- Exchange hashpartitioning(ColumnId#329, 200), ENSURE_REQUIREMENTS, 
[id=#278]


What exactly does [id=#278] refer to?
I have seen some examples that say this ID is a reference to a specific 
partition, a stage id, or a plan_id but I have not been able to confirm which 
one it is.

Thank you



Clarification on what "[id=#]" refers to in Physical Plan Exchange hashpartitioning

2024-04-04 Thread Tahj Anderson
Hello,

While looking through spark physical plans generated by the spark history 
server log to find any bottle necks in my code, I stumbled across an ID that 
shows up in a partitioning stage.
My goal is to use the history server log to provide meaningful analysis on my 
spark system performance. With this goal in mind, I am trying to connect spark 
physical plans to StageIDs which house useful information that I can tie back 
to my code. Below is a snippet from one of the physical plans.
+- *(2) Sort [Column#46 ASC NULLS FIRST], true, 0
+- Exchange hashpartitioning(ColumnId#329, 200), ENSURE_REQUIREMENTS, 
[id=#278]


What exactly does [id=#278] refer to?
I have seen some examples that say this ID is a reference to a specific 
partition, a stage id, or a plan_id but I have not been able to confirm which 
one it is.

Thank you,
Tahj