Github user shahidki31 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23088#discussion_r237154542
  
    --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
    @@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
         val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
     
         def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
    -      Utils.tryWithResource(
    -        store.view(classOf[TaskDataWrapper])
    -          .parent(stageKey)
    -          .index(index)
    -          .first(0L)
    -          .closeableIterator()
    -      ) { it =>
    -        var last = Double.NaN
    -        var currentIdx = -1L
    -        indices.map { idx =>
    -          if (idx == currentIdx) {
    -            last
    -          } else {
    -            val diff = idx - currentIdx
    -            currentIdx = idx
    -            if (it.skip(diff - 1)) {
    -              last = fn(it.next()).toDouble
    -              last
    -            } else {
    -              Double.NaN
    -            }
    -          }
    -        }.toIndexedSeq
    +      val quantileTasks = store.view(classOf[TaskDataWrapper])
    --- End diff --
    
    @srowen  Yes. everything is loaded in "sorted" order based on index, and 
then we do filtering. For In memory case, this doesn't cause any issue. but for 
diskStore extra de serialization overhead is there. 
    
    May be one possible solution can be, for diskStore case, bring only first 
time and sort  based on the corresponding indices to compute the quantiles.
    
    If the solution seems complicated, then we can tell the user that, summary 
metrics display the quantile summary of all the tasks, instead of completed.
    
    correct me if I am wrong


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to