Joerg Hoh created OAK-12212:
-------------------------------

             Summary: Drifts in PersistentDiskCache.cacheSize counter
                 Key: OAK-12212
                 URL: https://issues.apache.org/jira/browse/OAK-12212
             Project: Jackrabbit Oak
          Issue Type: Task
          Components: segment-azure
    Affects Versions: 2.0.0
            Reporter: Joerg Hoh


h2. Observation
A heap dump of a long-running instance shows:

* PersistentDiskCache.maxCacheSizeBytes ≈ 20 GiB (matches the configured value)
* AbstractPersistentCache.cacheSize (an AtomicLong, inherited) ≈ 80 GiB — 
roughly 4× the configured maximum
The actual cache directory on disk stays at or below the configured limit; only 
the in-memory counter has run away.


h2. Root cause
{{PersistentDiskCache.writeSegment(...)}} adds {{fileSize}} to the in-memory 
{{cacheSize}} on every invocation that reaches the write body, but the 
corresponding file on disk is replaced — not added — when the same segment id 
is written more than once. The writesPending guard inside {{writeSegment}} only 
prevents concurrently running tasks for the same id; it does not prevent 
sequentially submitted tasks. On POSIX file systems, {{Files.move(..., 
ATOMIC_MOVE)}} maps to rename(2) and silently replaces the destination, so the 
second (and subsequent) writes leave the directory unchanged in size while 
still incrementing the counter.

The eviction loop ({{cleanUpInternal}}) walks the directory and subtracts the 
actual length of each deleted file once. The "phantom" bytes contributed by 
redundant writes are therefore never repaid and accumulate monotonically over 
the lifetime of the JVM.

In addition, two smaller contributing factors keep the drift unidirectional 
(upward):

* cacheSize is initialized to 0 and is never reconciled against the existing 
cache directory at startup; it relies entirely on incremental accounting being 
correct.
* The error branch of {{writeSegment}} deletes segmentFile on any 
{{Files.move}} failure but does not decrement the counter for whatever 
contribution that file previously made.
Triggering workloads Any workload that produces multiple writes for the same 
segment id over time: concurrent cache misses on the same segment (e.g. 
compaction, online GC, indexing, mass traversal, standby replication, warm-up 
after restart). The probability per workload determines the rate at which the 
counter diverges — instances that run weeks/months will drift by tens of GiB 
regardless of how the workload looks at any given moment.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to