[
https://issues.apache.org/jira/browse/NIFI-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann resolved NIFI-15570.
-------------------------------------
Fix Version/s: 2.9.0
Resolution: Fixed
> Partial defragmentation of Content Repository via tail-claim truncation
> -----------------------------------------------------------------------
>
> Key: NIFI-15570
> URL: https://issues.apache.org/jira/browse/NIFI-15570
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 2.9.0
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> h3. Problem
> NiFi's FileSystemRepository uses a slab-allocation strategy for storing
> FlowFile content: multiple FlowFiles are written sequentially into a single
> ResourceClaim file on disk. This is efficient because it avoids the overhead
> of creating and deleting huge numbers of small files. However, it introduces
> a fragmentation problem.When any FlowFile still references a ResourceClaim,
> the entire file must be kept on disk — even if the vast majority of its bytes
> belong to FlowFiles that have already been removed. Consider a ResourceClaim
> that contains five ContentClaims of sizes 1 KB, 2 KB, 4 KB, 3 KB, and 1 GB.
> If only the 1 KB FlowFile remains, the full ~1 GB file stays on disk. At
> scale, this leads to disk exhaustion.A full defragmentation (rewriting live
> claims into new ResourceClaim files, updating all references, and deleting
> the originals) would be extremely complex and expensive. But it turns out we
> can solve the vast majority of the problem without it.
> h3. Key Insight
> With NiFi's slab allocation, there are three possible positions for a large
> ContentClaim within a ResourceClaim:
>
>
> {{<> = Small FlowFile}}
> [................] = Large FlowFile
>
> 1. Beginning: [................]<><><><><><><>
>
> 2. Middle: <><><><>[................]<><><>
>
> 3. End: <><><><><><><><>[................]
>
> NiFi already prevents cases 1 and 2. The
> nifi.content.claim.max.appendable.size property (default: 50 KB) causes the
> repository to stop appending to a ResourceClaim once it exceeds that
> threshold. Since a "large" ContentClaim is by definition larger than this
> threshold, the act of writing it will push the ResourceClaim past the (soft)
> limit, causing the ResourceClaim to be closed for further appending. No
> additional ContentClaims can be written after the large one.This means a
> large ContentClaim can only ever appear at the tail of a ResourceClaim. And
> truncating a file from the tail requires no data movement — it is a single
> FileChannel.truncate() call.
> h3. Solution
> This change implements "partial defragmentation" by truncating ResourceClaim
> files from the tail when the last (large) ContentClaim is removed. The
> approach consists of several coordinated components:Marking truncation
> candidates at write time — When a ContentClaim is closed in
> FileSystemRepository, the repository checks whether it is both (a) large
> (exceeding a threshold) and (b) at a non-zero offset (i.e., not the only
> claim in the file). If both conditions hold, the claim is flagged as a
> truncation candidate via StandardContentClaim.setTruncationCandidate(true).
> If the claim is later cloned (claimant count incremented), the flag is
> cleared, since truncation is only safe when the claim has a single
> owner.Routing truncatable claims through the FlowFile Repository — When a
> FlowFile is deleted or its content is replaced,
> WriteAheadFlowFileRepository.updateContentClaims() checks if the released
> ContentClaim is a truncation candidate. If so (and the ResourceClaim itself
> is not already fully destructable), the claim is queued in
> claimsAwaitingTruncation. On the next WAL checkpoint or sync, these claims
> are drained to ResourceClaimManager.markTruncatable().Background truncation
> in FileSystemRepository — A scheduled TruncateClaims task periodically drains
> truncatable claims from the ResourceClaimManager. Before truncating, it
> checks whether truncation is active for the claim's container (archive must
> be cleared on the last cleanup run and disk usage must exceed the configured
> threshold). If conditions are met, the file is truncated to the claim's
> offset via FileChannel.truncate(). If conditions are not met, the claims are
> saved in a TruncationClaimManager and retried on subsequent runs, ensuring no
> truncation opportunity is lost.Recovery — On restart,
> WriteAheadFlowFileRepository.restoreFlowFiles() re-derives truncation
> eligibility by scanning all recovered FlowFiles, identifying large claims at
> non-zero offsets that are at the tail of their ResourceClaim and are not
> shared by multiple FlowFiles.
> h3. Example
> Before truncation — a 1 GB FlowFile was removed but the ResourceClaim
> persists because small FlowFiles still reference it:
>
>
>
> {{ResourceClaim file (1,000,010 KB on disk):}}{{ [1 KB] [2 KB] [4 KB] [3 KB]
> [1,000,000 KB (removed)]}}{{After truncation — the file is truncated at the
> offset where the large claim began:}}
>
> {{ResourceClaim file (10 KB on disk):}}{{ [1 KB] [2 KB] [4 KB] [3 KB]}}
>
> The small FlowFiles remain fully readable. The 1 GB of wasted space is
> reclaimed instantly with a single syscall.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)