Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19476#discussion_r144775481
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -1552,4 +1582,65 @@ private[spark] object BlockManager {
         override val metricRegistry = new MetricRegistry
         metricRegistry.registerAll(metricSet)
       }
    +
    +  class RemoteBlockTempFileManager(blockManager: BlockManager)
    +      extends TempFileManager with Logging {
    +
    +    private class ReferenceWithCleanup(file: File, referenceQueue: 
JReferenceQueue[File])
    +        extends WeakReference[File](file, referenceQueue) {
    +      private val filePath = file.getAbsolutePath
    +
    +      def cleanUp(): Unit = {
    +        logDebug(s"Clean up file $filePath")
    +
    +        if (!new File(filePath).delete()) {
    +          logDebug(s"Fail to delete file $filePath")
    +        }
    +      }
    +    }
    --- End diff --
    
    My concern is that: for shuffle part, since there's a explicit API to 
`cleanup` temp files, so it's not so necessary to track again with weak 
reference. Also weak reference is triggered with GC, and shuffle operations are 
usually much more frequent and heavier, using weak reference to track temp 
shuffle files may increase the overhead of GC probably. Whereas, compared to 
shuffle, fetching remote blocks are happened occasionally when block is not 
cached in local, so using weak reference may not increase the overhead a lot.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to