Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r180179243 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener( } } - // No-op because logging every update would be overkill - override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { } + /** + * Log if there is a new peak value for one of the memory metrics for the given executor. + * Metrics are cleared out when a new stage is started in onStageSubmitted, so this will + * log new peak memory metric values per executor per stage. + */ + override def onExecutorMetricsUpdate(event: SparkListenerExecutorMetricsUpdate): Unit = { --- End diff -- We're not constructing a timeline currently, and yes, with only peak values, it could be rather sparse. We would get new values with new stages, but would not see decreases in memory during a stage. The situation where there is a stage 1 with peak X, and then stage 2 starts with peak Y > X is interesting though, and it would be useful to have this information, since we would then know to focus on stage 2 for memory consumption, even though both 1 and 2 could have the same peak values. Logging at stage start would double the amount of logging, and would be trickier, so I'd prefer either keeping the current approach or only logging at stage end. The higher logging was for smaller applications (and smaller logs).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org