Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23088#discussion_r237164158
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
    @@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
         val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
     
         def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
    -      Utils.tryWithResource(
    -        store.view(classOf[TaskDataWrapper])
    -          .parent(stageKey)
    -          .index(index)
    -          .first(0L)
    -          .closeableIterator()
    -      ) { it =>
    -        var last = Double.NaN
    -        var currentIdx = -1L
    -        indices.map { idx =>
    -          if (idx == currentIdx) {
    -            last
    -          } else {
    -            val diff = idx - currentIdx
    -            currentIdx = idx
    -            if (it.skip(diff - 1)) {
    -              last = fn(it.next()).toDouble
    -              last
    -            } else {
    -              Double.NaN
    -            }
    -          }
    -        }.toIndexedSeq
    +      val quantileTasks = store.view(classOf[TaskDataWrapper])
    --- End diff --
    
    I see so there's no real way to do it right without deserializing 
everything because that's the only way to know for sure what's successful.
    
    Leaving it as-is isn't so bad; if failed tasks are infrequent then the 
quantiles are still about right.
    
    I can think of fudges like searching ahead from the current index for the 
next successful task and using that. For the common case that might make the 
quantiles a little better. Iterate ahead, incrementing idx, and give up when 
hitting the next index.
    
    Hm @vanzin is that too complex vs taking the hit and deserializing it all? 
or just punt on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to