[ https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222690#comment-16222690 ]
Steve Loughran commented on HADOOP-14973: ----------------------------------------- If your customers want to know how to make effective use of a hadoop cluster in AWS then I believe we can assist with performance tuning: just send them our way, we'll help :) Tips of the professionals: * you can configure S3 buckets to log accesses to another bucket * you can use the UA settings (fs.s3a.user.agent) to declare what application/workflow is talking to the bucket * you can use big data analysis tools to go through the logs. > [s3a] Log StorageStatistics > --------------------------- > > Key: HADOOP-14973 > URL: https://issues.apache.org/jira/browse/HADOOP-14973 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1, 2.8.1 > Reporter: Sean Mackrory > Assignee: Sean Mackrory > > S3A is currently storing much more detailed metrics via StorageStatistics > than are logged in a MapReduce job. Eventually, it would be nice to get > Spark, MapReduce and other workloads to retrieve and store these metrics, but > it may be some time before they all do that. I'd like to consider having S3A > publish the metrics itself in some form. This is tricky, as S3A has no daemon > but lives inside various other processes. > Perhaps writing to a log file at some configurable interval and on close() > would be the best we could do. Other ideas would be welcome. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org