Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19476#discussion_r144775481 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1552,4 +1582,65 @@ private[spark] object BlockManager { override val metricRegistry = new MetricRegistry metricRegistry.registerAll(metricSet) } + + class RemoteBlockTempFileManager(blockManager: BlockManager) + extends TempFileManager with Logging { + + private class ReferenceWithCleanup(file: File, referenceQueue: JReferenceQueue[File]) + extends WeakReference[File](file, referenceQueue) { + private val filePath = file.getAbsolutePath + + def cleanUp(): Unit = { + logDebug(s"Clean up file $filePath") + + if (!new File(filePath).delete()) { + logDebug(s"Fail to delete file $filePath") + } + } + } --- End diff -- My concern is that: for shuffle part, since there's a explicit API to `cleanup` temp files, so it's not so necessary to track again with weak reference. Also weak reference is triggered with GC, and shuffle operations are usually much more frequent and heavier, using weak reference to track temp shuffle files may increase the overhead of GC probably. Whereas, compared to shuffle, fetching remote blocks are happened occasionally when block is not cached in local, so using weak reference may not increase the overhead a lot.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org