Github user edwinalu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20940#discussion_r179978102
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
         }
       }
     
    -  // No-op because logging every update would be overkill
    -  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
    +  /**
    +   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
    +   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
    +   * log new peak memory metric values per executor per stage.
    +   */
    +  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
    --- End diff --
    
    For a longer running stage, once it ramps up, hopefully there wouldn't be a 
lot of new peak values. Looking at a subset of our applications, the extra 
logging overhead has mostly been between 0.25% to 1%, but it can be 8%. 
    
    By logging each peak value at the time they occur (and reinitializing when 
a stage starts), it's possible to tell which stages are active at the time, and 
it would potentially be possible to graph these changes on a timeline -- this 
information wouldn't be available if the metrics are only logged at stage end, 
and the times are lost.
    
    Logging at stage end would limit the amount of extra logging. If we add 
more metrics (such as for offheap), then there could be more new peaks and more 
extra logging with the current approach. Excess logging is a concern, and I can 
move to stage end if the overhead is too much. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to