rakeshadr commented on a change in pull request #1815:
URL: https://github.com/apache/ozone/pull/1815#discussion_r561997122



##########
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2312,17 +2311,40 @@ private void listStatusFindKeyInTableCache(
     return fileStatusList;
   }
 
+  @SuppressWarnings("methodlength")
   public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
       String startKey, long numEntries, String clientAddress)
           throws IOException {
     Preconditions.checkNotNull(args, "Key args can not be null");
 
     // unsorted OMKeyInfo list contains combine results from TableCache and DB.
     List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
-    LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+
     if (numEntries <= 0) {
       return fileStatusFinalList;
     }
+
+    /**
+     * A map sorted by OmKey to combine results from TableCache and DB for
+     * each entity - Dir & File.
+     *
+     * Two separate maps are required because the order of seek -> (1)Seek
+     * files in fileTable (2)Seek dirs in dirTable.
+     *
+     * StartKey should be added to the final listStatuses, so if we combine
+     * files and dirs into a single map then directory with lower precedence
+     * will appear at the top of the list even if the startKey is given as
+     * fileName.
+     *
+     * For example, startKey="a/file1". As per the seek order, first fetches
+     * all the files and then it will start seeking all the directories.
+     * Assume a directory name exists "a/b". With one map, the sorted list will
+     * be ["a/b", "a/file1"]. But the expected list is: ["a/file1", "a/b"],
+     * startKey element should always be at the top of the listStatuses.

Review comment:
       For example, following are keys in FS. As we know, keys will again 
stored in `<parentID>/file-0` fashion, for the convenience to discuss, I have 
written full path name in the example.
   **Files stored in FileTable**
   a/file-0
   a/file-00
   a/file-1
   **Intermediate dirs in DirTable**
   a
   a/b
   
   Assume listStatus batchSize/numEntries=3. Now in V1 code, 
fs#listStatus("a/") will return result in two iteration. 
   Batch-1) [a/file-0, a/file-00, a/file-1]. BasicOzoneFS#listStatus has a 
logic to invoke OM by setting startKey="a/file-1"
   Batch-2) will return list like, [a/file-1, a/b]. Because 
BasicOzoneFS#listStatus at the client code is expecting the top element should 
be the startKey element.
   
   Earlier, in master code, like I said above it will return the elements in 
sorted order because it stored entries in KeyTable like below and string order 
was easily achieved across batches.
   a/b
   a/file-0
   a/file-00
   a/file-1
   
   Batch-1) [a/b, a/file-0, a/file-00]. BasicOzoneFS#listStatus has a logic to 
invoke OM by setting startKey="a/file-00".
   Batch-2) will return list like, [a/file-00, a/file-1]
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to