[jira] [Resolved] (HADOOP-18650) improve s3a committer stats collected

Steve Loughran (Jira) Wed, 15 Jan 2025 09:37:06 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-18650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran resolved HADOOP-18650.
-------------------------------------
    Resolution: Won't Fix

Don't think this is worth the effort

> improve s3a committer stats collected
> -------------------------------------
>
>                 Key: HADOOP-18650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.5
>            Reporter: Steve Loughran
>            Priority: Major
>
> we can improve stats collected in the s3a committer and saved to the JSON.
> key ones
> # of task manifests read; duration of loads
> # size of each manifest
> I think we would also benefit if we could set the commit thread pools to be 
> big -but then shared across all jobs (i.e. demand-created thread pool in s3a 
> fs). that would allow for a pool size of say, 500, but still support many 
> jobs actively committing at same time (busy spark driver)
> finally: should file commit pool size be > size of pool of manifest readers. 
> I think it could be, but the ratio should be fairly low.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HADOOP-18650) improve s3a committer stats collected

Reply via email to