[jira] [Commented] (SPARK-5847) Allow for configuring MetricsSystem's use of app ID to namespace all metrics
[ https://issues.apache.org/jira/browse/SPARK-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476885#comment-15476885 ] Apache Spark commented on SPARK-5847: - User 'AnthonyTruchet' has created a pull request for this issue: https://github.com/apache/spark/pull/15023 > Allow for configuring MetricsSystem's use of app ID to namespace all metrics > > > Key: SPARK-5847 > URL: https://issues.apache.org/jira/browse/SPARK-5847 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.1 >Reporter: Ryan Williams >Assignee: Mark Grover >Priority: Minor > Fix For: 2.1.0 > > > {{MetricsSystem}} [currently prepends the app ID to all > metrics|https://github.com/apache/spark/blob/c51ab37faddf4ede23243058dfb388e74a192552/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L131]. > When reading Spark metrics in Graphite, I've found this to not always be > desirable. Graphite is designed to track a mostly-unchanging set of metrics > over time; it allocates large zeroed-out files for each metric it sees, and > [by default rate-limits itself from creating many of > these|https://github.com/graphite-project/carbon/blob/79158ffde5949b4056eb7fdb5e9b6b583fe21ea4/conf/carbon.conf.example#L61-L68]. > App-ID namespacing means that Graphite is allocating disk-space for every > "metric" for every job it sees, when in reality some metrics may correspond > to others across jobs (e.g. driver JVM stats). > Some common Spark usage flows would be better modeled by e.g. namespacing > metrics by {{spark.app.name}}, so that successive runs of a given job would > share "metrics", from a storage perspective as well as allowing for > monitoring aspects of a job's performance over time / many runs. > There's not likely a one-size-fits-all solution here, so I'd propose allowing > the metrics config file to allow users to specify whether they'd like metrics > namespaced by {{spark.app.id}}, {{spark.app.name}}, or some other config > param. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5847) Allow for configuring MetricsSystem's use of app ID to namespace all metrics
[ https://issues.apache.org/jira/browse/SPARK-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384739#comment-15384739 ] Apache Spark commented on SPARK-5847: - User 'markgrover' has created a pull request for this issue: https://github.com/apache/spark/pull/14270 > Allow for configuring MetricsSystem's use of app ID to namespace all metrics > > > Key: SPARK-5847 > URL: https://issues.apache.org/jira/browse/SPARK-5847 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.1 >Reporter: Ryan Williams >Priority: Minor > > {{MetricsSystem}} [currently prepends the app ID to all > metrics|https://github.com/apache/spark/blob/c51ab37faddf4ede23243058dfb388e74a192552/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L131]. > When reading Spark metrics in Graphite, I've found this to not always be > desirable. Graphite is designed to track a mostly-unchanging set of metrics > over time; it allocates large zeroed-out files for each metric it sees, and > [by default rate-limits itself from creating many of > these|https://github.com/graphite-project/carbon/blob/79158ffde5949b4056eb7fdb5e9b6b583fe21ea4/conf/carbon.conf.example#L61-L68]. > App-ID namespacing means that Graphite is allocating disk-space for every > "metric" for every job it sees, when in reality some metrics may correspond > to others across jobs (e.g. driver JVM stats). > Some common Spark usage flows would be better modeled by e.g. namespacing > metrics by {{spark.app.name}}, so that successive runs of a given job would > share "metrics", from a storage perspective as well as allowing for > monitoring aspects of a job's performance over time / many runs. > There's not likely a one-size-fits-all solution here, so I'd propose allowing > the metrics config file to allow users to specify whether they'd like metrics > namespaced by {{spark.app.id}}, {{spark.app.name}}, or some other config > param. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5847) Allow for configuring MetricsSystem's use of app ID to namespace all metrics
[ https://issues.apache.org/jira/browse/SPARK-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323340#comment-14323340 ] Apache Spark commented on SPARK-5847: - User 'ryan-williams' has created a pull request for this issue: https://github.com/apache/spark/pull/4632 Allow for configuring MetricsSystem's use of app ID to namespace all metrics Key: SPARK-5847 URL: https://issues.apache.org/jira/browse/SPARK-5847 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.2.1 Reporter: Ryan Williams Priority: Minor {{MetricsSystem}} [currently prepends the app ID to all metrics|https://github.com/apache/spark/blob/c51ab37faddf4ede23243058dfb388e74a192552/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L131]. When reading Spark metrics in Graphite, I've found this to not always be desirable. Graphite is designed to track a mostly-unchanging set of metrics over time; it allocates large zeroed-out files for each metric it sees, and [by default rate-limits itself from creating many of these|https://github.com/graphite-project/carbon/blob/79158ffde5949b4056eb7fdb5e9b6b583fe21ea4/conf/carbon.conf.example#L61-L68]. App-ID namespacing means that Graphite is allocating disk-space for every metric for every job it sees, when in reality some metrics may correspond to others across jobs (e.g. driver JVM stats). Some common Spark usage flows would be better modeled by e.g. namespacing metrics by {{spark.app.name}}, so that successive runs of a given job would share metrics, from a storage perspective as well as allowing for monitoring aspects of a job's performance over time / many runs. There's not likely a one-size-fits-all solution here, so I'd propose allowing the metrics config file to allow users to specify whether they'd like metrics namespaced by {{spark.app.id}}, {{spark.app.name}}, or some other config param. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org