[ https://issues.apache.org/jira/browse/SPARK-47017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844145#comment-17844145 ]
Eric Yang commented on SPARK-47017: ----------------------------------- I'm preparing a PR for it. > Show metrics of the physical plan of RDDScanExec's internal RDD in the > history server > ------------------------------------------------------------------------------------- > > Key: SPARK-47017 > URL: https://issues.apache.org/jira/browse/SPARK-47017 > Project: Spark > Issue Type: New Feature > Components: Web UI > Affects Versions: 3.4.0, 3.5.0 > Reporter: Eric Yang > Priority: Major > Attachments: ScanExistingRDD.jpg, eventLogs-local-1708032228180.zip, > simple2.scala > > > The RDDScanExec wraps an internal RDD (as below). In our environment, we find > that this RDD is usually produced by some very large physical plans which > contain quite a few physical nodes. Those nodes may have various metrics > which are very useful for us to know what the execution looks like and any > room for optimization, etc. > > {code:java} > case class RDDScanExec( > output: Seq[Attribute], > rdd: RDD[InternalRow], <-- this field > name: String, {code} > > However, the physical plan and the metrics are invisible from the SQL DAG in > the Spark History Server. As it is an "existing RDD", the physical plan may > be found from some previous SQL. The metrics are not visible from that > previous SQL either. This is because the "definition" of these metrics are > reported along with the SparkListenerSQLExecutionStart event of the "previous > SQL" (where the physical plan of the RDDScanExec.rdd is in), but the metric > values are reported from the SparkListenerTaskEnd event of the tasks which > are attached to the SQL with RDDScanExec. > !ScanExistingRDD.jpg|width=336,height=296! > > Do we consider showing the physical plan and metrics of the RDDScanExec.rdd > (the "Scan Existing RDD" node in the above DAG). For example, it may be shown > as a "leg" (similar to but not the same as a child) in the DAG, or something > else that may show the physical plan and metrics? > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org