Re: [PR] NIFI-15570: Track count of Content Claim References for claims that could be truncated to ensure no premature truncation [nifi]

via GitHub Thu, 23 Apr 2026 07:30:00 -0700


exceptionfactory commented on code in PR #11034:
URL: https://github.com/apache/nifi/pull/11034#discussion_r3131561116



##########
nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java:
##########
@@ -1032,6 +1048,128 @@ public void purge() {
         resourceClaimManager.purge();
     }
 
+    private class TruncateClaims implements Runnable {
+
+        @Override
+        public void run() {
+            final Map<String, Boolean> truncationActivationCache = new 
HashMap<>();
+
+            // Go through any known truncation claims and truncate them now if 
truncation is enabled for their container.
+            for (final String container : containerNames) {
+                if (isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    final List<ContentClaim> toTruncate = 
truncationClaimManager.removeTruncationClaims(container);
+                    if (toTruncate.isEmpty()) {
+                        continue;
+                    }
+
+                    truncateClaims(toTruncate, truncationActivationCache);
+                }
+            }
+
+            // Drain any Truncation Claims from the Resource Claim Manager.
+            // If able, truncate those claims. Otherwise, save those claims in 
the Truncation Claim Manager to be truncated on the next run.
+            // This prevents us from having a case where we could truncate a 
big claim but we don't because we're not yet running out of disk space,
+            // but then we later start to run out of disk space and lost the 
opportunity to truncate that big claim.
+            // Loop to drain the entire queue in a single invocation rather 
than waiting for the next scheduled run. Because the default
+            // interval is 1 minute, waiting for the next run could delay 
truncation on a disk that is already under pressure and increases
+            // the risk of having too many claims that the queue overflows (in 
which case we would lose some optimization).
+            while (true) {
+                final List<ContentClaim> toTruncate = new ArrayList<>();
+                resourceClaimManager.drainTruncatableClaims(toTruncate, 
10_000);
+                if (toTruncate.isEmpty()) {
+                    return;
+                }
+
+                truncateClaims(toTruncate, truncationActivationCache);
+            }
+        }
+
+        private void truncateClaims(final List<ContentClaim> toTruncate, final 
Map<String, Boolean> truncationActivationCache) {
+            final Map<String, List<ContentClaim>> claimsSkipped = new 
HashMap<>();
+
+            for (final ContentClaim claim : toTruncate) {
+                final String container = 
claim.getResourceClaim().getContainer();
+                if (!isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    LOG.debug("Will not truncate {} because truncation is not 
active for container {}; will save for later truncation", claim, container);
+                    claimsSkipped.computeIfAbsent(container, key -> new 
ArrayList<>()).add(claim);
+                    continue;
+                }
+
+                if (claim.isTruncationCandidate()) {
+                    final int truncationReferenceCount = 
resourceClaimManager.getTruncationReferenceCount(claim);
+                    if (truncationReferenceCount > 0) {
+                        LOG.debug("Skipping truncation of {} because 
truncation reference count is {}", claim, truncationReferenceCount);
+                        continue;
+                    }
+
+                    truncate(claim);
+                }
+            }
+
+            claimsSkipped.forEach(truncationClaimManager::addTruncationClaims);
+        }
+
+        private boolean isTruncationActiveForContainer(final String container, 
final Map<String, Boolean> activationCache) {
+            // If not archiving data, we consider truncation always active.
+            if (!archiveData) {
+                return true;

Review Comment:
   Thanks for the reply, I think it depends on the implementation, and it being 
generally helpful to avoid short-circuit reasons, although I acknowledge that 
may not be the best approach in every situation. I could go either way on this 
particular one, so leaving it sounds good.



##########
nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java:
##########
@@ -1032,6 +1048,128 @@ public void purge() {
         resourceClaimManager.purge();
     }
 
+    private class TruncateClaims implements Runnable {
+
+        @Override
+        public void run() {
+            final Map<String, Boolean> truncationActivationCache = new 
HashMap<>();
+
+            // Go through any known truncation claims and truncate them now if 
truncation is enabled for their container.
+            for (final String container : containerNames) {
+                if (isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    final List<ContentClaim> toTruncate = 
truncationClaimManager.removeTruncationClaims(container);
+                    if (toTruncate.isEmpty()) {
+                        continue;
+                    }
+
+                    truncateClaims(toTruncate, truncationActivationCache);
+                }
+            }
+
+            // Drain any Truncation Claims from the Resource Claim Manager.
+            // If able, truncate those claims. Otherwise, save those claims in 
the Truncation Claim Manager to be truncated on the next run.
+            // This prevents us from having a case where we could truncate a 
big claim but we don't because we're not yet running out of disk space,
+            // but then we later start to run out of disk space and lost the 
opportunity to truncate that big claim.
+            // Loop to drain the entire queue in a single invocation rather 
than waiting for the next scheduled run. Because the default
+            // interval is 1 minute, waiting for the next run could delay 
truncation on a disk that is already under pressure and increases
+            // the risk of having too many claims that the queue overflows (in 
which case we would lose some optimization).
+            while (true) {
+                final List<ContentClaim> toTruncate = new ArrayList<>();
+                resourceClaimManager.drainTruncatableClaims(toTruncate, 
10_000);
+                if (toTruncate.isEmpty()) {
+                    return;
+                }
+
+                truncateClaims(toTruncate, truncationActivationCache);
+            }
+        }
+
+        private void truncateClaims(final List<ContentClaim> toTruncate, final 
Map<String, Boolean> truncationActivationCache) {
+            final Map<String, List<ContentClaim>> claimsSkipped = new 
HashMap<>();
+
+            for (final ContentClaim claim : toTruncate) {
+                final String container = 
claim.getResourceClaim().getContainer();
+                if (!isTruncationActiveForContainer(container, 
truncationActivationCache)) {
+                    LOG.debug("Will not truncate {} because truncation is not 
active for container {}; will save for later truncation", claim, container);
+                    claimsSkipped.computeIfAbsent(container, key -> new 
ArrayList<>()).add(claim);
+                    continue;
+                }
+
+                if (claim.isTruncationCandidate()) {
+                    final int truncationReferenceCount = 
resourceClaimManager.getTruncationReferenceCount(claim);
+                    if (truncationReferenceCount > 0) {
+                        LOG.debug("Skipping truncation of {} because 
truncation reference count is {}", claim, truncationReferenceCount);
+                        continue;
+                    }
+
+                    truncate(claim);
+                }
+            }
+
+            claimsSkipped.forEach(truncationClaimManager::addTruncationClaims);
+        }
+
+        private boolean isTruncationActiveForContainer(final String container, 
final Map<String, Boolean> activationCache) {
+            // If not archiving data, we consider truncation always active.
+            if (!archiveData) {
+                return true;

Review Comment:
   Thanks for the reply, I think it depends on the implementation, and it being 
generally helpful to avoid short-circuit returns, although I acknowledge that 
may not be the best approach in every situation. I could go either way on this 
particular one, so leaving it sounds good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] NIFI-15570: Track count of Content Claim References for claims that could be truncated to ensure no premature truncation [nifi]

Reply via email to