baibaichen commented on code in PR #55953:
URL: https://github.com/apache/spark/pull/55953#discussion_r3259968278


##########
core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala:
##########
@@ -70,6 +70,16 @@ private[memory] class StorageMemoryPool(
     acquireMemory(blockId, numBytes, numBytesToFree)
   }
 
+  /**
+   * Acquire `numBytes` for a [[ManagedConsumer]]: external bytes that never 
enter
+   * [[memoryStore]]'s `entries`, falling back to LRU eviction for any 
deficit. Caller is
+   * responsible for [[MemoryManager.shrinkExternal]] BEFORE this call; 
self-exclusion is
+   * handled in [[MemoryManager.acquireStorageMemory(self:ManagedConsumer,*]].
+   */
+  def acquireMemoryForManagedConsumer(numBytes: Long): Boolean = 
lock.synchronized {
+    acquireMemoryInternal(None, numBytes, math.max(0L, numBytes - memoryFree))
+  }

Review Comment:
   Thanks for the look! Two points:
   
   **1. `UnmanagedMemoryConsumer` is not real-time — wrong shape for caches.**
   
   UMC is purely pull-mode: [`getMemBytesUsed: 
Long`](https://github.com/apache/spark/blob/c8528a73d4b/core/src/main/scala/org/apache/spark/memory/UnmanagedMemoryConsumer.scala#L71)
 is polled every `spark.memory.unmanagedMemoryPollingInterval` (default `0s` = 
disabled) and [subtracted from 
`effectiveMaxMemory`](https://github.com/apache/spark/blob/c8528a73d4b/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L224-L225).
 Readings lag real usage, and there is no callback to ask the consumer to 
release. External caches need to evict synchronously under pressure and grow 
synchronously when storage is idle — interval-bounded staleness can't deliver 
that.
   
   **2. The storage pool is the master ledger — must bridge into it.**
   
   `MemoryStore`, execution borrow-from-storage, and `maybeGrowExecutionPool` 
all arbitrate against storage pool accounting. To share 
`spark.memory.offHeap.size` with `MemoryStore` and participate in the same 
borrow/spill/evict ordering, an external cache has to be visible **in** the 
pool, not "outside it" via UMC subtraction.
   
   On "global, not per-task" — agreed, that's what this SPI delivers: 
`ManagedConsumer` is registered on `UnifiedMemoryManager` (executor singleton), 
no per-task accounting.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to