[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-10912:
-
Labels: bulk-closed  (was: )

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
>  Labels: bulk-closed
> Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

2015-10-05 Thread Yongjia Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjia Wang updated SPARK-10912:
-
Attachment: s3a_metrics.patch

Adding s3a is fairly straightforward. I guess the reason it's not included is 
because s3a support (via hadoop-aws.jar) is not part of the default hadoop 
distribution due to licensing issues. I created a patch to enable s3a metrics, 
both on the executors and on the driver. Reporting shuffle statistics requires 
more thoughts, although all the numbers are already collected in 
TaskMetrics.scala (input, output, shuffle, local, remote, spill, records, 
bytes, etc). I think it would make sense to report the aggregated metrics per 
executor across all tasks, so it's easy to have an overall sense of disk I/O 
and network traffic.

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
> Attachments: s3a_metrics.patch
>
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

2015-10-03 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10912:
--
   Priority: Minor  (was: Major)
Component/s: Deploy

> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>Priority: Minor
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10912) Improve Spark metrics executor.filesystem

2015-10-02 Thread Yongjia Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjia Wang updated SPARK-10912:
-
Description: 
In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: "hdfs" 
and "file". I started using s3 as the persistent storage with Spark standalone 
cluster in EC2, and s3 read/write metrics do not appear anywhere. The 'file' 
metric appears to be only for driver reading local file, it would be nice to 
also report shuffle read/write metrics, so it can help with optimization.
I think these 2 things (s3 and shuffle) are very useful and cover all the 
missing information about Spark IO especially for s3 setup.

  was:
In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: "hdfs" 
and "file". I started using s3 as the persistent storage with Spark standalone 
cluster in EC2, and s3 read/write metrics do not appear anywhere. The 'file' 
metric appears to be only for driver reading local file, it would be nice to 
also report shuffle read/write metrics, so it can help understand things like 
if a Spark job becomes IO bound.
I think these 2 things (s3 and shuffle) are very useful and cover all the 
missing information about Spark IO especially for s3 setup.


> Improve Spark metrics executor.filesystem
> -
>
> Key: SPARK-10912
> URL: https://issues.apache.org/jira/browse/SPARK-10912
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Yongjia Wang
>
> In org.apache.spark.executor.ExecutorSource it has 2 filesystem metrics: 
> "hdfs" and "file". I started using s3 as the persistent storage with Spark 
> standalone cluster in EC2, and s3 read/write metrics do not appear anywhere. 
> The 'file' metric appears to be only for driver reading local file, it would 
> be nice to also report shuffle read/write metrics, so it can help with 
> optimization.
> I think these 2 things (s3 and shuffle) are very useful and cover all the 
> missing information about Spark IO especially for s3 setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org