aryangupta1998 commented on code in PR #9423:
URL: https://github.com/apache/ozone/pull/9423#discussion_r2602576239


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMDirectoriesPurgeResponseWithFSO.java:
##########
@@ -147,22 +148,22 @@ public void processPaths(
         
deletedSpaceOmMetadataManager.getDeletedDirTable().putWithBatch(deletedSpaceBatchOperation,
             ozoneDeleteKey, keyInfo);
 
-        
keySpaceOmMetadataManager.getDirectoryTable().deleteWithBatch(keySpaceBatchOperation,
-            ozoneDbKey);
-
         if (LOG.isDebugEnabled()) {
           LOG.debug("markDeletedDirList KeyName: {}, DBKey: {}",
               keyInfo.getKeyName(), ozoneDbKey);
         }
       }
 
+      for (HddsProtos.KeyValue keyRanges : path.getDeleteRangeSubDirsList()) {
+        keySpaceOmMetadataManager.getDirectoryTable()
+            .deleteRangeWithBatch(keySpaceBatchOperation, keyRanges.getKey(), 
keyRanges.getValue());

Review Comment:
   @sumitagrawl,
   
   Using delete ranges is safe for files because all file keys inside a 
directory share the directory’s full
       path prefix, and RocksDB keeps keys in lexicographic order. This means a 
range like:
           `["/vol/buck/dir1/file1", "/vol/buck/dir1/file9")`
       will include only the files that actually belong to "dir1". Even if a 
new file(e.g., "/vol/buck/dir1/file22")
       is created at runtime, its key will also fall within the correct 
prefixed range for "dir1", and is expected to be
       deleted as part of that directory cleanup. There is no risk of 
accidentally deleting files from other
       directories because their keys have different prefixes(e.g., 
"/vol/buck/dir2/..."), which fall outside this range.
       
   **But I found a case where the delete range may be unsafe for subdirs.**
   Consider the example below as a dir structure in the directory table:
   
   > dirA/
   > dirA/dir1
   > dirA/dir2
   > dirAA/
   > otherDir/
   > temp/
   
   If you delete `dirA`, the naive delete-range would be:
   
   > start = "dirA/"
   > end   = "dirA0"     // or something lexicographically just after "dirA/"
   
   But this range will include `dirAA/`, because:
   
   > "dirA" < "dirAA" < "dirA0"
   
   Hence, dirAA would be deleted unintentionally. So directory deletion must be 
prefix-bounded, not lexicographical-bounded.
   
   **Hence, I feel we should use delete range only for sub files, not for sub 
dirs, as in some corner cases it can be erroneous.**
   
   cc @swamirishi 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to