[GitHub] [hadoop] steveloughran commented on pull request #2548: DRAFT PR: Implementing ListStatusRemoteIterator
steveloughran commented on pull request #2548: URL: https://github.com/apache/hadoop/pull/2548#issuecomment-760912523 please, give it a JIRA ID, so when I look @ my notifications I know what it is about, same when I search in my inbox. I don't want to have put in that effort myself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2548: DRAFT PR: Implementing ListStatusRemoteIterator
steveloughran commented on pull request #2548: URL: https://github.com/apache/hadoop/pull/2548#issuecomment-759367197 1. What is is the JIRA ID? 2. As discussed, you need an uber-JIRA to cover the whole set of list optimisations you can do, including an overall goal. I would recommend something like "ABFS listing to support asynchronous prefetch and optimise for incremental listing of large directories and deep/wide directory trees" That is: if that hurts performance of listing empty directories, or calling the listX calls against files, that is acceptable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2548: DRAFT PR: Implementing ListStatusRemoteIterator
steveloughran commented on pull request #2548: URL: https://github.com/apache/hadoop/pull/2548#issuecomment-753991353 I need you to use `org.apache.hadoop.util.functional.RemoteIterators` as the wrapper iterators. These are only in trunk but will be backported with the rest of HADOOP-16380 after a few days of stabilisation. These iterators propagate the IOStatisticsSource interface, so when the innermost iterator collects cost/count of list calls, the stats will be visible to and collectable by callers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2548: DRAFT PR: Implementing ListStatusRemoteIterator
steveloughran commented on pull request #2548: URL: https://github.com/apache/hadoop/pull/2548#issuecomment-748226755 we should talk about this in 2021. For now * see #2553 for IOStatistics collection *including in remote iterators*, and a class *RemoteIterators* to help you wrap them * look @ mukund's work HADOOP-17400 including the issue of when to report failures I think it makes sense to have an over all "optimise abfs incremental listings" JIRA and create issues underneath, as a lot is unified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org