[GitHub] [hadoop-ozone] rakeshadr commented on a change in pull request #1503: HDDS-4332: ListFileStatus - do lookup in directory and file tables

GitBox Tue, 27 Oct 2020 00:06:02 -0700


rakeshadr commented on a change in pull request #1503:
URL: https://github.com/apache/hadoop-ozone/pull/1503#discussion_r512452877




##########
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
##########
@@ -2205,6 +2276,318 @@ private void listStatusFindKeyInTableCache(
     return fileStatusList;
   }
 
+  public List<OzoneFileStatus> listStatusV1(OmKeyArgs args, boolean recursive,
+      String startKey, long numEntries, String clientAddress)
+          throws IOException {
+    Preconditions.checkNotNull(args, "Key args can not be null");
+
+    // unsorted OMKeyInfo list contains combine results from TableCache and DB.
+    List<OzoneFileStatus> fileStatusFinalList = new ArrayList<>();
+    LinkedHashSet<OzoneFileStatus> fileStatusList = new LinkedHashSet<>();
+    if (numEntries <= 0) {
+      return fileStatusFinalList;
+    }
+
+    String volumeName = args.getVolumeName();
+    String bucketName = args.getBucketName();
+    String keyName = args.getKeyName();
+    String seekFileInDB;
+    String seekDirInDB;
+    long prefixKeyInDB;
+    String prefixPath = keyName;
+
+    int countEntries = 0;
+
+    metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName,
+            bucketName);
+    try {
+      if (Strings.isNullOrEmpty(startKey)) {
+        OzoneFileStatus fileStatus = getFileStatus(args, clientAddress);
+        if (fileStatus.isFile()) {
+          return Collections.singletonList(fileStatus);
+        }
+
+        // Not required to search in DeletedTable because all the deleted
+        // keys will be marked directly in dirTable or in keyTable by
+        // breaking the pointer to its sub-dirs. So, there is no issue of
+        // inconsistency.
+
+        /*
+         * keyName is a directory.
+         * Say, "/a" is the dir name and its objectID is 1024, then seek
+         * will be doing with "1024/" to get all immediate descendants.
+         */
+        if (fileStatus.getKeyInfo() != null) {
+          prefixKeyInDB = fileStatus.getKeyInfo().getObjectID();
+        } else {
+          // list root directory.
+          String bucketKey = metadataManager.getBucketKey(volumeName,
+                  bucketName);
+          OmBucketInfo omBucketInfo =
+                  metadataManager.getBucketTable().get(bucketKey);
+          prefixKeyInDB = omBucketInfo.getObjectID();
+        }
+        seekFileInDB = metadataManager.getOzonePathKey(prefixKeyInDB, "");
+        seekDirInDB = metadataManager.getOzonePathKey(prefixKeyInDB, "");
+
+        // TODO: recursive flag=true will be handled in HDDS-4360 jira.
+        // Order of seek -> (1)Seek dirs in dirTable (2)Seek files in fileTable
+        // 1. Seek the given key in key table.
+        countEntries = getFilesFromDirectory(fileStatusList, seekFileInDB,
+                prefixPath, prefixKeyInDB, startKey, countEntries, numEntries);
+        // 2. Seek the given key in dir table.
+        getDirectories(recursive, startKey, numEntries, fileStatusList,
+                volumeName, bucketName, seekDirInDB, prefixKeyInDB,
+                prefixPath, countEntries);
+      } else {
+        /*
+         * startKey will be used in iterator seek and sets the beginning point
+         * for key traversal.
+         *
+         * key name will be used as parentID where the user has requested to
+         * list the keys from.
+         *
+         * When recursive flag=false, parentID won't change between two pages.
+         * For example: OM has a namespace like,
+         *    /a/1...1M files and /a/b/1...1M files.
+         *    /a/1...1M directories and /a/b/1...1M directories.
+         * Listing "/a", will always have the parentID as "a" irrespective of
+         * the startKey value.
+         */
+        // TODO: recursive flag=true will be handled in HDDS-4360 jira.
+        OzoneFileStatus fileStatusInfo = getOzoneFileStatusV1(volumeName,
+                bucketName, startKey, false, null, true);
+
+        if (fileStatusInfo != null) {
+          prefixKeyInDB = fileStatusInfo.getKeyInfo().getParentObjectID();
+          if(fileStatusInfo.isDirectory()){
+            seekDirInDB = metadataManager.getOzonePathKey(prefixKeyInDB,
+                    fileStatusInfo.getKeyInfo().getFileName());
+
+            // Order of seek -> (1) Seek dirs in dirTable. In OM, always the
+            // order of search is, first seek into fileTable and then dirTable.
+            // So, its not required to search again into the fileTable.
+
+            // Seek the given key in dirTable.
+            getDirectories(recursive, startKey, numEntries,
+                    fileStatusList, volumeName, bucketName, seekDirInDB,
+                    prefixKeyInDB, prefixPath, countEntries);
+
+          } else {
+            seekFileInDB = metadataManager.getOzonePathKey(prefixKeyInDB,

Review comment:
       As we have to perform seek in two tables to finish the listing, I'm 
maintaining a seek order 
   1) Seek all the files from **FileTable**, then
   2) Start seeking dirs from **DirTable**.
   
   Thats the reason for continuing seek the given key in **DirTable** using 
logic:
   ```
               // begins from the first sub-dir under the parent dir
               seekDirInDB = metadataManager.getOzonePathKey(prefixKeyInDB, "");
   ```
   
   ```
   For example: OM has a namespace like,
   /a/1...1M files and /a/dir1...dirM directories
   
   keyName='/a' and its objectId is "1027/"
   ```
   
   Here, iterate to finish all 1M files using the startKey indexes and then 
move the parent pointer to '1027/' to seek the contents from DirTable. Now, 
onwards the startKey represents directory entires. Once all dirs are listed 
thats the break point and come out to send response back to the user #lisStatus 
API call.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[GitHub] [hadoop-ozone] rakeshadr commented on a change in pull request #1503: HDDS-4332: ListFileStatus - do lookup in directory and file tables

Reply via email to