[ 
https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222690#comment-16222690
 ] 

Steve Loughran commented on HADOOP-14973:
-----------------------------------------

If your customers want to know how to make effective use of a hadoop cluster in 
AWS then I believe we can assist with performance tuning: just send them our 
way, we'll help :)

Tips of the professionals: 
* you can configure S3 buckets to log accesses to another bucket
* you can use the UA settings (fs.s3a.user.agent) to declare what 
application/workflow is talking to the bucket
* you can use big data analysis tools to go through the logs.


> [s3a] Log StorageStatistics
> ---------------------------
>
>                 Key: HADOOP-14973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14973
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1, 2.8.1
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> S3A is currently storing much more detailed metrics via StorageStatistics 
> than are logged in a MapReduce job. Eventually, it would be nice to get 
> Spark, MapReduce and other workloads to retrieve and store these metrics, but 
> it may be some time before they all do that. I'd like to consider having S3A 
> publish the metrics itself in some form. This is tricky, as S3A has no daemon 
> but lives inside various other processes.
> Perhaps writing to a log file at some configurable interval and on close() 
> would be the best we could do. Other ideas would be welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to