Re: [PR] NIFI-15570: Track count of Content Claim References for claims that could be truncated to ensure no premature truncation [nifi]

via GitHub Fri, 17 Apr 2026 12:22:48 -0700


markap14 commented on code in PR #11034:
URL: https://github.com/apache/nifi/pull/11034#discussion_r3102737641



##########
nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java:
##########
@@ -1032,6 +1048,128 @@ public void purge() {
         resourceClaimManager.purge();
     }
 
+    private class TruncateClaims implements Runnable {
+
+        @Override
+        public void run() {
+            final Map<String, Boolean> truncationActivationCache = new 
HashMap<>();
+
+            // Go through any known truncation claims and truncate them now if 
truncation is enabled for their container.
+            for (final String container : containerNames) {
+                if (isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    final List<ContentClaim> toTruncate = 
truncationClaimManager.removeTruncationClaims(container);
+                    if (toTruncate.isEmpty()) {
+                        continue;
+                    }
+
+                    truncateClaims(toTruncate, truncationActivationCache);
+                }
+            }
+
+            // Drain any Truncation Claims from the Resource Claim Manager.
+            // If able, truncate those claims. Otherwise, save those claims in 
the Truncation Claim Manager to be truncated on the next run.
+            // This prevents us from having a case where we could truncate a 
big claim but we don't because we're not yet running out of disk space,
+            // but then we later start to run out of disk space and lost the 
opportunity to truncate that big claim.
+            // Loop to drain the entire queue in a single invocation rather 
than waiting for the next scheduled run. Because the default
+            // interval is 1 minute, waiting for the next run could delay 
truncation on a disk that is already under pressure and increases
+            // the risk of having too many claims that the queue overflows (in 
which case we would lose some optimization).
+            while (true) {
+                final List<ContentClaim> toTruncate = new ArrayList<>();
+                resourceClaimManager.drainTruncatableClaims(toTruncate, 
10_000);
+                if (toTruncate.isEmpty()) {
+                    return;
+                }
+
+                truncateClaims(toTruncate, truncationActivationCache);
+            }
+        }
+
+        private void truncateClaims(final List<ContentClaim> toTruncate, final 
Map<String, Boolean> truncationActivationCache) {
+            final Map<String, List<ContentClaim>> claimsSkipped = new 
HashMap<>();
+
+            for (final ContentClaim claim : toTruncate) {
+                final String container = 
claim.getResourceClaim().getContainer();
+                if (!isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    LOG.debug("Will not truncate {} because truncation is not 
active for container {}; will save for later truncation", claim, container);
+                    claimsSkipped.computeIfAbsent(container, key -> new 
ArrayList<>()).add(claim);
+                    continue;
+                }
+
+                if (claim.isTruncationCandidate()) {
+                    final int truncationReferenceCount = 
resourceClaimManager.getTruncationReferenceCount(claim);
+                    if (truncationReferenceCount > 0) {
+                        LOG.debug("Skipping truncation of {} because 
truncation reference count is {}", claim, truncationReferenceCount);
+                        continue;
+                    }
+
+                    truncate(claim);
+                }
+            }
+
+            claimsSkipped.forEach(truncationClaimManager::addTruncationClaims);
+        }
+
+        private boolean isTruncationActiveForContainer(final String container, 
final Map<String, Boolean> activationCache) {
+            // If not archiving data, we consider truncation always active.
+            if (!archiveData) {
+                return true;

Review Comment:
   I don't think having a small number of return statements should be a goal. I 
find this method very clear personally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] NIFI-15570: Track count of Content Claim References for claims that could be truncated to ensure no premature truncation [nifi]

Reply via email to