Github user edwinalu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20940#discussion_r180179243
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
         }
       }
     
    -  // No-op because logging every update would be overkill
    -  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
    +  /**
    +   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
    +   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
    +   * log new peak memory metric values per executor per stage.
    +   */
    +  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
    --- End diff --
    
    We're not constructing a timeline currently, and yes, with only peak 
values, it could be rather sparse. We would get new values with new stages, but 
would not see decreases in memory during a stage.
    
    The situation where there is a stage 1 with peak X, and then stage 2 starts 
with peak Y > X is interesting though, and it would be useful to have this 
information, since we would then know to focus on stage 2 for memory 
consumption, even though both 1 and 2 could have the same peak values. Logging 
at stage start would double the amount of logging, and would be trickier, so 
I'd prefer either keeping the current approach or only logging at stage end.
    
    The higher logging was for smaller applications (and smaller logs).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to