Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20940#discussion_r181896603
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
         }
       }
     
    -  // No-op because logging every update would be overkill
    -  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
    +  /**
    +   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
    +   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
    +   * log new peak memory metric values per executor per stage.
    +   */
    +  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
    --- End diff --
    
    is your concern the size of the msg from the executors to the driver?
    
    that certainly is valid, but I wonder if we should think a bit harder about 
this if that is going to be a common concern, as I think we'll want to add more 
metrics.
    
    One possibility is for the executor to do the peak calculation itself, and 
then only send an update for the metrics with a new peak.  Also that would let 
us just send the peak on task end events.
    
    I'm just brainstorming at the moment, not saying it should be changed one 
way or the other ... need to think about it more


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to