[ https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222142#comment-16222142 ]
Steve Loughran commented on HADOOP-14973: ----------------------------------------- w.r.t compute engines, they can ask for storage stats of the src/dest filesystems, collect and then aggregate from the workers to the driver. You don't want to look through the logs of 20 processes and isolate which streams were from which jobs on a multitenant spark cluster. You want the summary of input & output stream perf with the job. That can be something which the committers can help with, @ FileOutputFormat and spark protocol level > [s3a] Log StorageStatistics > --------------------------- > > Key: HADOOP-14973 > URL: https://issues.apache.org/jira/browse/HADOOP-14973 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.0.0-beta1, 2.8.1 > Reporter: Sean Mackrory > Assignee: Sean Mackrory > > S3A is currently storing much more detailed metrics via StorageStatistics > than are logged in a MapReduce job. Eventually, it would be nice to get > Spark, MapReduce and other workloads to retrieve and store these metrics, but > it may be some time before they all do that. I'd like to consider having S3A > publish the metrics itself in some form. This is tricky, as S3A has no daemon > but lives inside various other processes. > Perhaps writing to a log file at some configurable interval and on close() > would be the best we could do. Other ideas would be welcome. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org