jiwen624 opened a new pull request, #46714:
URL: https://github.com/apache/spark/pull/46714

   ### What changes were proposed in this pull request?
   For FileFormatDataWriter we currently record metrics of "task commit time" 
and "job commit time" in 
`org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker#metrics`:
   ```
         TASK_COMMIT_TIME -> SQLMetrics.createTimingMetric(sparkContext, "task 
commit time"),
         JOB_COMMIT_TIME -> SQLMetrics.createTimingMetric(sparkContext, "job 
commit time"),
   ```
   We may also record the time spent on "data write" (together with the time 
spent on producing records from the iterator), which is usually one of the 
major parts of the total duration of a writing operation. 
   
   ### Why are the changes needed?
   It helps us identify the bottleneck and time skew during the data write, and 
also helps on the generic performance tuning.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, in the SQL page of the Spark History Server (and live UI), a new "data 
write time" metric is shown on the data write command/operation nodes. For 
example:
   <img width="492" alt="image" 
src="https://github.com/apache/spark/assets/14141331/bd06b4a6-e1b8-4b36-967a-376abd793fa7";>
   
   ### How was this patch tested?
   Unit test case and manual tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to