[ 
https://issues.apache.org/jira/browse/HADOOP-17281?focusedWorklogId=493671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493671
 ]

ASF GitHub Bot logged work on HADOOP-17281:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Oct/20 20:12
            Start Date: 01/Oct/20 20:12
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2354:
URL: https://github.com/apache/hadoop/pull/2354#issuecomment-702371522


   Looks good. Annoying about the return types which force you to do that 
wrapping/casting. Can't you just forcibly cast the return type of the inner 
iterator? after all, type erasure means all type info will be lost in the 
actual compiled binary. I'd prefer that as it will give you automatic 
passthrough of the IOStatistics stuff.
   
   Add text to filesystem.md, something which: 
   
   * specifies the result is exactly the same a listStatus, provided no other 
caller updates the directory during the list
   * declares that it's not atomic and performance implementations will page
   * and that if a path isn't there, that fact may not surface until 
next/hasNext...that is, we do lazy eval for all file IO
   
   
   We need to similar new contract tests in AbstractContractGetFileStatusTest 
for all to use
   
   * that in a dir with files and subdirectories, you get both returned in the 
listing
   * that you can iterate through with next() to failure as well as 
hasNext/next, and get the same results
   * listStatusIterator(file) returns the file
   * listStatusIterator("/") gives you a listing of root (put that in 
AbstractContractRootDirectoryTest)
   
   And two for changes partway through the iteration
   
   * change the directory during a list to add/delete files
   * deletes the actual path.
   
   These tests can't assert on what will happen, and with paged IO aren't 
likely to pick up on changes...there just to show it can be done and pick up on 
any major issues with implementations.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 493671)
    Time Spent: 20m  (was: 10m)

> Implement FileSystem.listStatusIterator() in S3AFileSystem
> ----------------------------------------------------------
>
>                 Key: HADOOP-17281
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17281
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently S3AFileSystem only implements listStatus() api which returns an 
> array. Once we implement the listStatusIterator(), clients can benefit from 
> the async listing done recently 
> https://issues.apache.org/jira/browse/HADOOP-17074  by performing some tasks 
> on files while iterating them.
>  
> CC [~stevel]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to