[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

Yongjia Wang (JIRA) Mon, 05 Oct 2015 15:36:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yongjia Wang updated SPARK-10912:
---------------------------------
    Attachment: s3a_metrics.patch

Adding s3a is fairly straightforward. I guess the reason it's not included is 
because s3a support (via hadoop-aws.jar) is not part of the default hadoop 
distribution due to licensing issues. I created a patch to enable s3a metrics, 
both on the executors and on the driver. Reporting shuffle statistics requires 
more thoughts, although all the numbers are already collected in 
TaskMetrics.scala (input, output, shuffle, local, remote, spill, records, 
bytes, etc). I think it would make sense to report the aggregated metrics per 
executor across all tasks, so it's easy to have an overall sense of disk I/O 
and network traffic.

> Improve Spark metrics executor.filesystem
> -----------------------------------------
>
>                 Key: SPARK-10912
>                 URL: https://issues.apache.org/jira/browse/SPARK-10912
>             Project: Spark
>          Issue Type: Improvement
>          Components: Deploy
>    Affects Versions: 1.5.0
>            Reporter: Yongjia Wang
>            Priority: Minor
>         Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

Reply via email to