[jira] [Commented] (HADOOP-19474) ABFS: [FnsOverBlob] Listing Optimizations to avoid multiple iteration over list response.

ASF GitHub Bot (Jira) Wed, 05 Mar 2025 22:55:36 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932862#comment-17932862
 ]


ASF GitHub Bot commented on HADOOP-19474:
-----------------------------------------

anmolanmol1234 commented on code in PR #7421:
URL: https://github.com/apache/hadoop/pull/7421#discussion_r1982791170


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1903,39 +1916,57 @@ private List<AbfsHttpHeader> 
getMetadataHeadersList(final Hashtable<String, Stri
    * This is to handle duplicate listing entries returned by Blob Endpoint for
    * implicit paths that also has a marker file created for them.
    * This will retain entry corresponding to marker file and remove the 
BlobPrefix entry.
+   * This will also filter out all the rename pending json files in listing 
output.
    * @param listResultSchema List of entries returned by Blob Endpoint.
+   * @param uri URI to be used for path conversion.
    * @return List of entries after removing duplicates.
    */
-  private BlobListResultSchema removeDuplicateEntries(BlobListResultSchema 
listResultSchema) {
-    List<BlobListResultEntrySchema> uniqueEntries = new ArrayList<>();
+  private ListResponseData filterDuplicateEntriesAndRenamePendingFiles(
+      BlobListResultSchema listResultSchema, URI uri) throws IOException {
+    List<FileStatus> fileStatuses = new ArrayList<>();
+    Map<Path, Integer> renamePendingJsonPaths = new HashMap<>();
     TreeMap<String, BlobListResultEntrySchema> nameToEntryMap = new 
TreeMap<>();
 
     for (BlobListResultEntrySchema entry : listResultSchema.paths()) {
       if (StringUtils.isNotEmpty(entry.eTag())) {
         // This is a blob entry. It is either a file or a marker blob.
         // In both cases we will add this.
         nameToEntryMap.put(entry.name(), entry);
+        fileStatuses.add(getVersionedFileStatusFromEntry(entry, uri));
+
+        if (isRenamePendingJsonPathEntry(entry)) {
+          renamePendingJsonPaths.put(entry.path(), 
entry.contentLength().intValue());
+        }
       } else {
         // This is a BlobPrefix entry. It is a directory with file inside
         // This might have already been added as a marker blob.
         if (!nameToEntryMap.containsKey(entry.name())) {
           nameToEntryMap.put(entry.name(), entry);
+          fileStatuses.add(getVersionedFileStatusFromEntry(entry, uri));
         }
       }
     }
 
-    uniqueEntries.addAll(nameToEntryMap.values());
-    listResultSchema.withPaths(uniqueEntries);
-    return listResultSchema;
+    ListResponseData listResponseData = new ListResponseData();
+    listResponseData.setFileStatusList(fileStatuses);
+    listResponseData.setRenamePendingJsonPaths(renamePendingJsonPaths);
+    listResponseData.setContinuationToken(listResultSchema.getNextMarker());
+    return listResponseData;
+  }
+
+  private boolean isRenamePendingJsonPathEntry(BlobListResultEntrySchema 
entry) {

Review Comment:
   missing javadocs





> ABFS: [FnsOverBlob] Listing Optimizations to avoid multiple iteration over 
> list response.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19474
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19474
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.5.0, 3.4.1
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Major
>              Labels: pull-request-available
>
> On blob endpoint, there are a couple of handling that is needed to be done on 
> client side.
> This involves:
>  # Parsing of xml response and converting them to VersionedFileStatus list
>  # Removing duplicate entries for non-empty explicit directories coming due 
> to presence of the marker files
>  # Trigerring Rename recovery on the previously failed rename indicated by 
> the presence of pending json file.
> Currently all three are done in a separate iteration over whole list. This is 
> to pbring all those things to a common place so that single iteration over 
> list reposne can handle all three.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19474) ABFS: [FnsOverBlob] Listing Optimizations to avoid multiple iteration over list response.

Reply via email to