[
https://issues.apache.org/jira/browse/HADOOP-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932862#comment-17932862
]
ASF GitHub Bot commented on HADOOP-19474:
-----------------------------------------
anmolanmol1234 commented on code in PR #7421:
URL: https://github.com/apache/hadoop/pull/7421#discussion_r1982791170
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java:
##########
@@ -1903,39 +1916,57 @@ private List<AbfsHttpHeader>
getMetadataHeadersList(final Hashtable<String, Stri
* This is to handle duplicate listing entries returned by Blob Endpoint for
* implicit paths that also has a marker file created for them.
* This will retain entry corresponding to marker file and remove the
BlobPrefix entry.
+ * This will also filter out all the rename pending json files in listing
output.
* @param listResultSchema List of entries returned by Blob Endpoint.
+ * @param uri URI to be used for path conversion.
* @return List of entries after removing duplicates.
*/
- private BlobListResultSchema removeDuplicateEntries(BlobListResultSchema
listResultSchema) {
- List<BlobListResultEntrySchema> uniqueEntries = new ArrayList<>();
+ private ListResponseData filterDuplicateEntriesAndRenamePendingFiles(
+ BlobListResultSchema listResultSchema, URI uri) throws IOException {
+ List<FileStatus> fileStatuses = new ArrayList<>();
+ Map<Path, Integer> renamePendingJsonPaths = new HashMap<>();
TreeMap<String, BlobListResultEntrySchema> nameToEntryMap = new
TreeMap<>();
for (BlobListResultEntrySchema entry : listResultSchema.paths()) {
if (StringUtils.isNotEmpty(entry.eTag())) {
// This is a blob entry. It is either a file or a marker blob.
// In both cases we will add this.
nameToEntryMap.put(entry.name(), entry);
+ fileStatuses.add(getVersionedFileStatusFromEntry(entry, uri));
+
+ if (isRenamePendingJsonPathEntry(entry)) {
+ renamePendingJsonPaths.put(entry.path(),
entry.contentLength().intValue());
+ }
} else {
// This is a BlobPrefix entry. It is a directory with file inside
// This might have already been added as a marker blob.
if (!nameToEntryMap.containsKey(entry.name())) {
nameToEntryMap.put(entry.name(), entry);
+ fileStatuses.add(getVersionedFileStatusFromEntry(entry, uri));
}
}
}
- uniqueEntries.addAll(nameToEntryMap.values());
- listResultSchema.withPaths(uniqueEntries);
- return listResultSchema;
+ ListResponseData listResponseData = new ListResponseData();
+ listResponseData.setFileStatusList(fileStatuses);
+ listResponseData.setRenamePendingJsonPaths(renamePendingJsonPaths);
+ listResponseData.setContinuationToken(listResultSchema.getNextMarker());
+ return listResponseData;
+ }
+
+ private boolean isRenamePendingJsonPathEntry(BlobListResultEntrySchema
entry) {
Review Comment:
missing javadocs
> ABFS: [FnsOverBlob] Listing Optimizations to avoid multiple iteration over
> list response.
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-19474
> URL: https://issues.apache.org/jira/browse/HADOOP-19474
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.5.0, 3.4.1
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Major
> Labels: pull-request-available
>
> On blob endpoint, there are a couple of handling that is needed to be done on
> client side.
> This involves:
> # Parsing of xml response and converting them to VersionedFileStatus list
> # Removing duplicate entries for non-empty explicit directories coming due
> to presence of the marker files
> # Trigerring Rename recovery on the previously failed rename indicated by
> the presence of pending json file.
> Currently all three are done in a separate iteration over whole list. This is
> to pbring all those things to a common place so that single iteration over
> list reposne can handle all three.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]