Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22335#discussion_r215360659
  
    --- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -710,9 +719,21 @@ private[spark] class AppStatusListener(
         val executorId = event.blockUpdatedInfo.blockManagerId.executorId
     
         // Whether values are being added to or removed from the existing 
accounting.
    +    // BlockManager always send empty block status message when user try 
to remove rdd block,
    +    // so we try to get this removed block size from rdd partition to get 
accurate memory/disk storage size.
         val storageLevel = event.blockUpdatedInfo.storageLevel
    -    val diskDelta = event.blockUpdatedInfo.diskSize * (if 
(storageLevel.useDisk) 1 else -1)
    -    val memoryDelta = event.blockUpdatedInfo.memSize * (if 
(storageLevel.useMemory) 1 else -1)
    +    val diskDelta: Long = storageLevel != StorageLevel.NONE match {
    --- End diff --
    
    A simple `if` is much clearer (and probably faster) here. But see later 
comment. I believe it's better for this code to be in the "unpersist" callback, 
and leave the block manager code unchanged.
    
    This code is also wrong the way it is. Because if you're changing the 
storage level of an RDD (e.g. if it was cached on disk, but after the update, 
it's not) then this is doing the wrong thing now.
    
    (So, another argument for treating the "unpersist" event differently, and 
in the appropriate callback.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to