Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20532#discussion_r166852617
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -53,10 +53,21 @@ package object config {
           .booleanConf
           .createWithDefault(false)
     
    -  private[spark] val EVENT_LOG_BLOCK_UPDATES =
    -    ConfigBuilder("spark.eventLog.logBlockUpdates.enabled")
    -      .booleanConf
    -      .createWithDefault(false)
    +  private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION =
    +    ConfigBuilder("spark.eventLog.logBlockUpdates.fraction")
    +      .doc("Expected number of times each blockUpdated event is chosen to 
log, " +
    +        "fraction must be [0, 1]. 0 by default, means disabled")
    +      .doubleConf
    +      .checkValue(_ >= 0, "The fraction must not be negative")
    --- End diff --
    
    >how about control the max number of events recorded per time split?
    
    I think this approach is still hard to balance the user requirement and 
event log size. Spark will possibly ignore the events that is required by the 
user at the specific time.
    
    IMO, using "true" or "false" might be a feasible solution - whether to dump 
all the events or just ignore them. For normal user, by default (false) should 
be enough for them, but if you want further analysis, you can enable this by 
taking the risk of large event file.
    
    For the configuration, I think we could use something like 
"spark.eventLog.logVerboseEvent.enabled" to control all the verbose events that 
will be dumped manually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to