Re: [PR] HDDS-13432. Accelerating Namespace Usage Calculation in Recon using - Materialised Approach [ozone]

via GitHub Mon, 04 Aug 2025 03:22:12 -0700


ArafatKhan2198 commented on code in PR #8797:
URL: https://github.com/apache/ozone/pull/8797#discussion_r2251055515



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskDbEventHandler.java:
##########
@@ -199,4 +230,47 @@ protected boolean flushAndCommitNSToDB(Map<Long, 
NSSummary> nsSummaryMap) {
     }
     return true;
   }
+
+  /**
+   * Propagates size and count changes upwards through the parent chain.
+   * This ensures that when files are added/deleted, all ancestor directories
+   * reflect the total changes in their sizeOfFiles and numOfFiles fields.
+   */
+  protected void propagateSizeUpwards(long objectId, long sizeChange, 
+                                       long countChange, Map<Long, NSSummary> 
nsSummaryMap) 
+                                       throws IOException {
+    // Get the current directory's NSSummary
+    NSSummary nsSummary = nsSummaryMap.get(objectId);
+    if (nsSummary == null) {
+      nsSummary = reconNamespaceSummaryManager.getNSSummary(objectId);
+    }
+    if (nsSummary == null) {
+      return; // No more parents to update
+    }
+
+    // Continue propagating to parent
+    long parentId = nsSummary.getParentId();
+    if (parentId != 0) {
+      // Get parent's NSSummary
+      NSSummary parentSummary = nsSummaryMap.get(parentId);
+      if (parentSummary == null) {
+        parentSummary = reconNamespaceSummaryManager.getNSSummary(parentId);
+      }
+      if (parentSummary != null) {
+        // Update parent's totals
+        parentSummary.setSizeOfFiles(parentSummary.getSizeOfFiles() + 
sizeChange);
+        parentSummary.setNumOfFiles(parentSummary.getNumOfFiles() + 
(int)countChange);
+        int[] fileBucket = parentSummary.getFileSizeBucket();
+        int binIndex = ReconUtils.getFileSizeBinIndex(Math.abs(sizeChange));
+        ++fileBucket[binIndex];

Review Comment:
   Thanks for the catch !
   Fixed it 



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskDbEventHandler.java:
##########
@@ -199,4 +230,47 @@ protected boolean flushAndCommitNSToDB(Map<Long, 
NSSummary> nsSummaryMap) {
     }
     return true;
   }
+
+  /**
+   * Propagates size and count changes upwards through the parent chain.
+   * This ensures that when files are added/deleted, all ancestor directories
+   * reflect the total changes in their sizeOfFiles and numOfFiles fields.
+   */
+  protected void propagateSizeUpwards(long objectId, long sizeChange, 
+                                       long countChange, Map<Long, NSSummary> 
nsSummaryMap) 
+                                       throws IOException {
+    // Get the current directory's NSSummary
+    NSSummary nsSummary = nsSummaryMap.get(objectId);
+    if (nsSummary == null) {
+      nsSummary = reconNamespaceSummaryManager.getNSSummary(objectId);
+    }
+    if (nsSummary == null) {
+      return; // No more parents to update
+    }
+
+    // Continue propagating to parent
+    long parentId = nsSummary.getParentId();
+    if (parentId != 0) {
+      // Get parent's NSSummary
+      NSSummary parentSummary = nsSummaryMap.get(parentId);
+      if (parentSummary == null) {
+        parentSummary = reconNamespaceSummaryManager.getNSSummary(parentId);
+      }
+      if (parentSummary != null) {
+        // Update parent's totals
+        parentSummary.setSizeOfFiles(parentSummary.getSizeOfFiles() + 
sizeChange);
+        parentSummary.setNumOfFiles(parentSummary.getNumOfFiles() + 
(int)countChange);
+        int[] fileBucket = parentSummary.getFileSizeBucket();
+        int binIndex = ReconUtils.getFileSizeBinIndex(Math.abs(sizeChange));

Review Comment:
   
   **@devesh** We don't need the effective size because file size buckets track 
individual file sizes, not directory totals.
   
   **Example:**
   ```
   Directory contains:
   - file1.txt: 10MB
   - file2.txt: 50MB
   ```
   
   **When we add file3.txt (20MB):**
   ```java
   sizeChange = 20MB  // size of the new file being added
   binIndex = getFileSizeBinIndex(20MB)  // correctly classifies the 20MB file
   ```
   
   **Why this is correct:** File size distribution answers "How many files of 
each size range exist?" We want to know we have one 10MB file, one 50MB file, 
and one 20MB file - not that the directory totals 80MB.
   
   The current implementation correctly uses the individual file size 
(sizeChange) because that's what determines which size bucket the file belongs 
to. Using directory totals would break the semantic meaning of file size 
distribution analytics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-13432. Accelerating Namespace Usage Calculation in Recon using - Materialised Approach [ozone]

Reply via email to