[GitHub] spark pull request #20952: [SPARK-6951][core] Speed up parsing of event logs...

squito Mon, 09 Apr 2018 12:02:35 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20952#discussion_r180197210
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -417,15 +419,23 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
             .filter { entry =>
               try {
                 val info = listing.read(classOf[LogInfo], 
entry.getPath().toString())
    -            if (info.fileSize < entry.getLen()) {
    -              // Log size has changed, it should be parsed.
    -              true
    -            } else {
    +
    +            if (info.appId.isDefined) {
                   // If the SHS view has a valid application, update the time 
the file was last seen so
                   // that the entry is not deleted from the SHS listing.
    -              if (info.appId.isDefined) {
    -                listing.write(info.copy(lastProcessed = newLastScanTime))
    +              listing.write(info.copy(lastProcessed = newLastScanTime))
    +            }
    +
    +            if (info.fileSize < entry.getLen()) {
    +              if (info.appId.isDefined && fastInProgressParsing) {
    +                // When fast in-progress parsing is on, we don't need to 
re-parse when the
    +                // size changes, but we do need to invalidate any existing 
UIs.
    +                invalidateUI(info.appId.get, info.attemptId)
    +                false
    --- End diff --
    
    do you need to update info.fileSize here too?  Because you skip the 
re-parse, you don't update it in mergeApplicationListings.  So i think once you 
hit this condition once, you'll always invalidate the UI on every iteration.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20952: [SPARK-6951][core] Speed up parsing of event logs...

Reply via email to