This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
     new d9f6c6d9be [DOC] add documentation for the OOCEvictionManager design 
(#2370)
d9f6c6d9be is described below

commit d9f6c6d9beba4ca0dbf3f664855e39661ae6cb82
Author: Janardhan Pulivarthi <[email protected]>
AuthorDate: Sat Dec 6 20:26:28 2025 +0530

    [DOC] add documentation for the OOCEvictionManager design (#2370)
---
 .../instructions/ooc/OOCEvictionManager.java       | 86 ++++++++++++++--------
 1 file changed, 54 insertions(+), 32 deletions(-)

diff --git 
a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java
 
b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java
index f5ae7573b0..099a26ebd9 100644
--- 
a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java
+++ 
b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java
@@ -46,41 +46,63 @@ import java.util.concurrent.locks.Condition;
 import java.util.concurrent.locks.ReentrantLock;
 
 /**
- * Eviction Manager for the Out-Of-Core stream cache
- * This is the base implementation for LRU, FIFO
- *
- * Design choice 1: Pure JVM-memory cache
- * What: Store MatrixBlock objects in a synchronized in-memory cache
- *   (Map + Deque for LRU/FIFO). Spill to disk by serializing MatrixBlock
- *   only when evicting.
- * Pros: Simple to implement; no off-heap management; easy to debug;
- *   no serialization race since you serialize only when evicting;
- *   fast cache hits (direct object access).
- * Cons: Heap usage counted roughly via serialized-size estimate — actual
- *   JVM object overhead not accounted; risk of GC pressure and OOM if
- *   estimates are off or if many small objects cause fragmentation;
- *   eviction may be more expensive (serialize on eviction).
- * <p>
- * Design choice 2:
- * <p>
- * This manager runtime memory management by caching serialized
- * ByteBuffers and spilling them to disk when needed.
+ * Eviction Manager for the Out-Of-Core (OOC) stream cache.
  * <p>
- * * core function: Caches ByteBuffers (off-heap/direct) and
- * spills them to disk
- * * Eviction: Evicts a ByteBuffer by writing its contents to a file
- * * Granularity: Evicts one IndexedMatrixValue block at a time
- * * Data replay: get() will always return the data either from memory or
- *   by falling back to the disk
- * * Memory: Since the datablocks are off-heap (in ByteBuffer) or disk,
- *   there won't be OOM.
+ * This manager implements a high-performance, thread-safe buffer pool designed
+ * to handle intermediate results that exceed available heap memory. It employs
+ * a <b>partitioned eviction</b> strategy to maximize disk throughput and a
+ * <b>lock-striped</b> concurrency model to minimize thread contention.
+ *
+ * <h2>1. Purpose</h2>
+ * Provides a bounded cache for {@code MatrixBlock}s produced and consumed by 
OOC
+ * streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure
+ * exceeds a configured limit, blocks are transparently evicted to disk and 
restored
+ * on demand, allowing execution of operations larger than RAM.
+ *
+ * <h2>2. Lifecycle Management</h2>
+ * Blocks transition atomically through three states to ensure data 
consistency:
+ * <ul>
+ * <li><b>HOT:</b> The block is pinned in the JVM heap ({@code value != 
null}).</li>
+ * <li><b>EVICTING:</b> A transition state. The block is currently being 
written to disk.
+ * Concurrent readers must wait on the entry's condition variable.</li>
+ * <li><b>COLD:</b> The block is persisted on disk. The heap reference is 
nulled out
+ * to free memory, but the container (metadata) remains in the cache map.</li>
+ * </ul>
  *
- * Pros: Avoids heap OOM by keeping large data off-heap; predictable
- *   memory usage; good for very large blocks.
- * Cons: More complex synchronization; need robust off-heap allocator/free;
- *   must ensure serialization finishes before adding to queue or make evict
- *   wait on serialization; careful with native memory leaks.
+ * <h2>3. Eviction Strategy (Partitioned I/O)</h2>
+ * To mitigate I/O thrashing caused by writing thousands of small blocks:
+ * <ul>
+ * <li>Eviction is <b>partition-based</b>: Groups of "HOT" blocks are gathered 
into
+ * batches (e.g., 64MB) and written sequentially to a single partition 
file.</li>
+ * <li>This converts random I/O into high-throughput sequential I/O.</li>
+ * <li>A separate metadata map tracks the {@code (partitionId, offset)} for 
every
+ * evicted block, allowing random-access reloading.</li>
+ * </ul>
+ *
+ * <h2>4. Data Integrity (Re-hydration)</h2>
+ * To prevent index corruption during serialization/deserialization cycles, 
this manager
+ * uses a "re-hydration" model. The {@code IndexedMatrixValue} container is 
<b>never</b>
+ * removed from the cache structure. Eviction only nulls the data payload. 
Loading
+ * restores the data into the existing container, preserving the original 
{@code MatrixIndexes}.
+ *
+ * <h2>5. Concurrency Model (Fine-Grained Locking)</h2>
+ * <ul>
+ * <li><b>Global Structure Lock:</b> A coarse-grained lock ({@code 
_cacheLock}) guards
+ * the {@code LinkedHashMap} structure against concurrent insertions, 
deletions,
+ * and iteration during eviction selection.</li>
+ *
+ * <li><b>Per-Block Locks:</b> Each {@code BlockEntry} owns an independent
+ * {@code ReentrantLock}. This decouples I/O operations, allowing a reader to 
load
+ * "Block A" from disk while the evictor writes "Block B" to disk 
simultaneously,
+ * maximizing throughput.</li>
+ *
+ * <li><b>Condition Queues:</b> To handle read-write races, the system uses 
atomic
+ * state transitions. If a reader attempts to access a block in the {@code 
EVICTING}
+ * state, it waits on the entry's {@code Condition} variable until the writer
+ * signals that the block is safely {@code COLD} (persisted).</li>
+ * </ul>
  */
+
 public class OOCEvictionManager {
 
        // Configuration: OOC buffer limit as percentage of heap

Reply via email to