[ 
https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871839#comment-16871839
 ] 

Steve Loughran commented on SPARK-28091:
----------------------------------------

The real source of metrics for S3A (and the Azure stores, Google cloud) is 
actually the per-Instance StorageStatistics you can get from 
{{getStorageStatistics)) on an instance; look at 
{{org.apache.hadoop.fs.s3a.Statistic}} for what is collected there...pretty 
much every operation has its own counter.

It'd be interesting to see what your collection code does here. Also be 
interesting to think about whether people could extend it to get low level 
stats from CPUs themselves. 

One issue I have found with trying to collect per-query stats is that the 
filesystem-instance counters can't be tied down to specific queries, as they 
aren't per-thread. No good solutions there, at least nothing under dev, though 
the Impala team have been asking for stuff (primarily collecting input stream 
stats on seek cost)

> Extend Spark metrics system with executor plugin metrics
> --------------------------------------------------------
>
>                 Key: SPARK-28091
>                 URL: https://issues.apache.org/jira/browse/SPARK-28091
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Luca Canali
>            Priority: Minor
>
> This proposes to improve Spark instrumentation by adding a hook for Spark 
> executor plugin metrics to the Spark metrics systems implemented with the 
> Dropwizard/Codahale library.
> Context: The Spark metrics system provides a large variety of metrics, see 
> also SPARK-26890, useful to  monitor and troubleshoot Spark workloads. A 
> typical workflow is to sink the metrics to a storage system and build 
> dashboards on top of that.
> Improvement: The original goal of this work was to add instrumentation for S3 
> filesystem access metrics by Spark job. Currently, [[ExecutorSource]] 
> instruments HDFS and local filesystem metrics. Rather than extending the code 
> there, we proposes to add a metrics plugin system which is of more flexible 
> and general use.
> Advantages:
>  * The metric plugin system makes it easy to implement instrumentation for S3 
> access by Spark jobs.
>  * The metrics plugin system allows for easy extensions of how Spark collects 
> HDFS-related workload metrics. This is currently done using the Hadoop 
> Filesystem GetAllStatistics method, which is deprecated in recent versions of 
> Hadoop. Recent versions of Hadoop Filesystem recommend using method 
> GetGlobalStorageStatistics, which also provides several additional metrics. 
> GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been 
> introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an 
> easy way to “opt in” using such new API calls for those deploying suitable 
> Hadoop versions.
>  * We also have the use case of adding Hadoop filesystem monitoring for a 
> custom Hadoop compliant filesystem in use in our organization (EOS using the 
> XRootD protocol). The metrics plugin infrastructure makes this easy to do. 
> Others may have similar use cases.
>  * More generally, this method makes it straightforward to plug in Filesystem 
> and other metrics to the Spark monitoring system. Future work on plugin 
> implementation can address extending monitoring to measure usage of external 
> resources (OS, filesystem, network, accelerator cards, etc), that maybe would 
> not normally be considered general enough for inclusion in Apache Spark code, 
> but that can be nevertheless useful for specialized use cases, tests or 
> troubleshooting.
> Implementation:
> The proposed implementation is currently a WIP open for comments and 
> improvements. It is based on the work on Executor Plugin of SPARK-24918 and 
> builds on recent work on extending Spark executor metrics, such as SPARK-25228
> Tests and examples:
> This has been so far manually tested running Spark on YARN and K8S clusters, 
> in particular for monitoring S3 and for extending HDFS instrumentation with 
> the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric 
> plugin example and code used for testing are available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to