Stephen O'Donnell created HDFS-14663:
----------------------------------------

             Summary: HTTPFS ListStatus_Batch does not return batches as 
expected
                 Key: HDFS-14663
                 URL: https://issues.apache.org/jira/browse/HDFS-14663
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: httpfs
    Affects Versions: 3.3.0
            Reporter: Stephen O'Donnell


The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can 
retrieve the file listing for a large directory in chunks.

When using the webhdfs service embedded in the namenode, this works as 
expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns 
the entire listing rather than batches, working effectively like LISTSTATUS 
instead.

This seems to be because HTTPFS falls back to using the method 
org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be 
overridden, but the implementation used in HTTPFS has not done that, leading to 
this limitation.

This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based on 
my testing it does not work as intended. I suspect it is because the 
listStatusBatch operation was added to the WebHdfsFileSystem and 
HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS seems 
to use DistributeFileSystem and hence it falls back to the default 
implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which returns 
all entries in a single batch.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to