[ 
https://issues.apache.org/jira/browse/SPARK-47017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844145#comment-17844145
 ] 

Eric Yang commented on SPARK-47017:
-----------------------------------

I'm preparing a PR for it. 

> Show metrics of the physical plan of RDDScanExec's internal RDD in the 
> history server
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-47017
>                 URL: https://issues.apache.org/jira/browse/SPARK-47017
>             Project: Spark
>          Issue Type: New Feature
>          Components: Web UI
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Eric Yang
>            Priority: Major
>         Attachments: ScanExistingRDD.jpg, eventLogs-local-1708032228180.zip, 
> simple2.scala
>
>
> The RDDScanExec wraps an internal RDD (as below). In our environment, we find 
> that this RDD is usually produced by some very large physical plans which 
> contain quite a few physical nodes. Those nodes may have various metrics 
> which are very useful for us to know what the execution looks like and any 
> room for optimization, etc.
>  
> {code:java}
> case class RDDScanExec(
>     output: Seq[Attribute],
>     rdd: RDD[InternalRow],     <-- this field
>     name: String, {code}
>  
> However, the physical plan and the metrics are invisible from the SQL DAG in 
> the Spark History Server. As it is an "existing RDD", the physical plan may 
> be found from some previous SQL. The metrics are not visible from that 
> previous SQL either. This is because the "definition" of these metrics are 
> reported along with the SparkListenerSQLExecutionStart event of the "previous 
> SQL" (where the physical plan of the RDDScanExec.rdd is in), but the metric 
> values are reported from the SparkListenerTaskEnd event of the tasks which 
> are attached to the SQL with RDDScanExec.
> !ScanExistingRDD.jpg|width=336,height=296!
>  
> Do we consider showing the physical plan and metrics of the RDDScanExec.rdd 
> (the "Scan Existing RDD" node in the above DAG). For example, it may be shown 
> as a "leg" (similar to but not the same as a child) in the DAG, or something 
> else that may show the physical plan and metrics?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to