[jira] [Commented] (SPARK-9004) Add s3 bytes read/written metrics

2016-10-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572414#comment-15572414
 ] 

Steve Loughran commented on SPARK-9004:
---

HADOOP-13605 added a whole new set of counters for HDFS, S3 and hopefully soon 
Azure; there's an API call on the FS {{getStorageStatistics()}} to query these.

One problem though: this isn't shipping in Hadoop branch-2 yet, so you can't 
write code that uses it, not unless there's some introspection/plugin 
mechanism. 

All the stats are just {{name: String -> value: Long}}, so a something to 
collect a {{Map[String, Long]}} would work. 

> Add s3 bytes read/written metrics
> -
>
> Key: SPARK-9004
> URL: https://issues.apache.org/jira/browse/SPARK-9004
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Reporter: Abhishek Modi
>Priority: Minor
>
> s3 read/write metrics can be pretty useful in finding the total aggregate 
> data processed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9004) Add s3 bytes read/written metrics

2016-08-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432462#comment-15432462
 ] 

Steve Loughran commented on SPARK-9004:
---

If you know the filesystem, you can get summary stats from 
{{FileSystem.getStatistics()}}; they'd have to be collected across all the 
executors

These counters are per-JVM, not isolated into individual jobs

> Add s3 bytes read/written metrics
> -
>
> Key: SPARK-9004
> URL: https://issues.apache.org/jira/browse/SPARK-9004
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Reporter: Abhishek Modi
>Priority: Minor
>
> s3 read/write metrics can be pretty useful in finding the total aggregate 
> data processed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9004) Add s3 bytes read/written metrics

2015-07-12 Thread Abhishek Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623834#comment-14623834
 ] 

Abhishek Modi commented on SPARK-9004:
--

Hadoop separates HDFS bytes, local filesystem bytes and S3 bytes in counters. 
Spark combines all of them in its metrics. Separating them could give a better 
idea of IO distribution.

Here's how it works in MR: 

1. Client creates a Job object (org.apache.hadoop.mapreduce.Job). It submits to 
the RM which then launches the AM etc.
2. After job submission, Client continuously monitors the job to see if it is 
finished. 
3. Once the job is finished, the client gets the counters of the job via the 
getCounters() function. 
4. It logs on the client using "Counters=" format.

I don't really know how to implement it. Can it be done by modifying 
NewHadoopRDD because i guess that's where the Job object is being used ?


> Add s3 bytes read/written metrics
> -
>
> Key: SPARK-9004
> URL: https://issues.apache.org/jira/browse/SPARK-9004
> Project: Spark
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Priority: Minor
>
> s3 read/write metrics can be pretty useful in finding the total aggregate 
> data processed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org