Joerg Hoh created OAK-12212:
-------------------------------
Summary: Drifts in PersistentDiskCache.cacheSize counter
Key: OAK-12212
URL: https://issues.apache.org/jira/browse/OAK-12212
Project: Jackrabbit Oak
Issue Type: Task
Components: segment-azure
Affects Versions: 2.0.0
Reporter: Joerg Hoh
h2. Observation
A heap dump of a long-running instance shows:
* PersistentDiskCache.maxCacheSizeBytes ≈ 20 GiB (matches the configured value)
* AbstractPersistentCache.cacheSize (an AtomicLong, inherited) ≈ 80 GiB —
roughly 4× the configured maximum
The actual cache directory on disk stays at or below the configured limit; only
the in-memory counter has run away.
h2. Root cause
{{PersistentDiskCache.writeSegment(...)}} adds {{fileSize}} to the in-memory
{{cacheSize}} on every invocation that reaches the write body, but the
corresponding file on disk is replaced — not added — when the same segment id
is written more than once. The writesPending guard inside {{writeSegment}} only
prevents concurrently running tasks for the same id; it does not prevent
sequentially submitted tasks. On POSIX file systems, {{Files.move(...,
ATOMIC_MOVE)}} maps to rename(2) and silently replaces the destination, so the
second (and subsequent) writes leave the directory unchanged in size while
still incrementing the counter.
The eviction loop ({{cleanUpInternal}}) walks the directory and subtracts the
actual length of each deleted file once. The "phantom" bytes contributed by
redundant writes are therefore never repaid and accumulate monotonically over
the lifetime of the JVM.
In addition, two smaller contributing factors keep the drift unidirectional
(upward):
* cacheSize is initialized to 0 and is never reconciled against the existing
cache directory at startup; it relies entirely on incremental accounting being
correct.
* The error branch of {{writeSegment}} deletes segmentFile on any
{{Files.move}} failure but does not decrement the counter for whatever
contribution that file previously made.
Triggering workloads Any workload that produces multiple writes for the same
segment id over time: concurrent cache misses on the same segment (e.g.
compaction, online GC, indexing, mass traversal, standby replication, warm-up
after restart). The probability per workload determines the rate at which the
counter diverges — instances that run weeks/months will drift by tens of GiB
regardless of how the workload looks at any given moment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)