[jira] [Work logged] (HADOOP-13704) S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use

ASF GitHub Bot (Jira) Wed, 09 Mar 2022 14:43:05 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-13704?focusedWorklogId=739174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-739174
 ]


ASF GitHub Bot logged work on HADOOP-13704:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Mar/22 22:42
            Start Date: 09/Mar/22 22:42
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #3978:
URL: https://github.com/apache/hadoop/pull/3978#issuecomment-1063453062


   .. not deliberately ignoring this, just falling behind on reviews while i 
try to get my manifest committer out the door. reviews there welcome, even 
though it targets abfs and gcs. #2971
   
    i will pull some of this back into the s3a committer afterwards, including 
stat names and some IO enhancements to get parquet files writing faster 
(disabling existence/overwrite checks in __magic dirs)
   
   #3289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 739174)
    Time Spent: 2h 20m  (was: 2h 10m)

> S3A getContentSummary() to move to listFiles(recursive) to count children; 
> instrument use
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13704
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13704
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hive and a bit of Spark use {{getContentSummary()}} to get some summary stats 
> of a filesystem. This is very expensive on S3A (and any other object store), 
> especially as the base implementation does the recursive tree walk.
> Because of HADOOP-13208, we have a full enumeration of files under a path 
> without directory costs...S3A can/should switch to this to speed up those 
> places where the operation is called.
> Also
> * API call needs FS spec and contract tests
> * S3A could instrument invocation, so as to enable real-world popularity to 
> be measured



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work logged] (HADOOP-13704) S3A getContentSummary() to move to listFiles(recursive) to count children; instrument use

Reply via email to