[ https://issues.apache.org/jira/browse/HADOOP-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678845#comment-17678845 ]
Thomas Newton commented on HADOOP-18599: ---------------------------------------- Admittedly the functionality I'm interested in is already available on the `AzureBlobFileSystemStore` but as I said, I got the distinct impression that this is intended to always be used through a `FileSystem` I particularly got this from stuff like [https://github.com/apache/hadoop/blob/cf7b7b961035d433b2c89f8dcd53016830d4d1a5/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java#L54|https://github.com/apache/hadoop/blob/72b760130aee907de12db09d1123880b9935523f/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java#L54] > Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem` > ------------------------------------------------------------------------- > > Key: HADOOP-18599 > URL: https://issues.apache.org/jira/browse/HADOOP-18599 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure > Affects Versions: 3.3.2, 3.3.4 > Reporter: Thomas Newton > Priority: Major > > When working with Azure blob storage listing operations can often be quite > slow even on storage accounts with the hierarchical namespace. > This can be mitigated by listing only a specific subset of directories using > a function like > [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-] > Which accepts a `startFrom` argument and lists all files in order starting > from there. > I'm wondering if we could add a method to the `AzureBlobFileSystem` > Something like: > ``` > public FileStatus[] listStatus(final Path f, final String startFrom) throws > IOException > ``` > This exposes the functionality that already exists on the underlying > `AzureBlobFileSystemStore`. My understanding from reading a bit of the code > is that users should mainly be dealing with `AzureBlobFileSystem`s and > `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing > it on the `AzureBlobFileSystem`. > > I'm very un-familiar with java but I'm told that keeping strictly to > interfaces is strongly preferred. However I can see some examples already on > `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) > so I'm hoping its acceptable to add a method like I described only for the > one `FileSystem` implementation. > > The specific motivation for this is to unblock > [https://github.com/delta-io/delta/issues/1568] > I would be willing to contribute this if maintainers think the plan is > reasonable. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org