Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20940#discussion_r179796239 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -93,6 +94,9 @@ private[spark] class EventLoggingListener( // Visible for tests only. private[scheduler] val logPath = getLogPath(logBaseDir, appId, appAttemptId, compressionCodecName) + // Peak metric values for each executor + private var peakExecutorMetrics = new mutable.HashMap[String, PeakExecutorMetrics]() --- End diff -- you need to handle overlapping stages. I think you actually need to key on both executor and stage, and on stage end, you only clear the metrics for that stage. EDIT: ok after I went through everything, I think I see how this works -- since you log on every new peak, you'll also get a logged message for the earlier update. But as I mention below, this strategy seems like it'll result in a lot of extra logging. Maybe I'm wrong, though, would be great to see how much the logs grow this way.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org