Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r179978102 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + /** + * Log if there is a new peak value for one of the memory metrics for the given executor. + * Metrics are cleared out when a new stage is started in onStageSubmitted, so this will + * log new peak memory metric values per executor per stage. + */ + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { --- End diff -- For a longer running stage, once it ramps up, hopefully there wouldn't be a lot of new peak values. Looking at a subset of our applications, the extra logging overhead has mostly been between 0.25% to 1%, but it can be 8%. By logging each peak value at the time they occur (and reinitializing when a stage starts), it's possible to tell which stages are active at the time, and it would potentially be possible to graph these changes on a timeline -- this information wouldn't be available if the metrics are only logged at stage end, and the times are lost. Logging at stage end would limit the amount of extra logging. If we add more metrics (such as for offheap), then there could be more new peaks and more extra logging with the current approach. Excess logging is a concern, and I can move to stage end if the overhead is too much.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org