[ 
https://issues.apache.org/jira/browse/SPARK-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997215#comment-14997215
 ] 

Steve Loughran commented on SPARK-11373:
----------------------------------------

[~charlesyeh] I've just put up a pull request of what I had in mind, with those 
basic fs metrics, and JVM & thread info. 

I couldn't hook this up to the spark metrics system as there wasn't one that 
could be used ... for now I've just gone direct to the codahale servlets and 
classes for registration.

Your suggestion of a new history metrics system would be the right thing to do 
... but I would really like those metrics to be fetchable as bits of JSON at 
the end of URLs —that's both enumerating the whole set and reading specific 
values. Why?

# lets me ask for performance stats from anyone with a web browser to hand, you 
can say "do a curl history:1800/metrics/metrics > metrics.json" and I've got 
something I can attach to bug reports.
# lets me write tests which query the metrics for the state of the provider, 
e.g. probe a counter of seconds-since-successful update to be between 0 and 60 
before trying to list the applications and expecting them to be found. Or, 
after mocking a connectivity failure, verify that the failure counts have gone 
up.

Anyway: the draft is up, I won't be working on it again for the next couple of 
weeks —if, after reviewing my patch you could take it and do a real spark 
history metrics system, that'd really progress it. And again, that's where the 
servlets would help: testing the metrics system itself.

> Add metrics to the History Server and providers
> -----------------------------------------------
>
>                 Key: SPARK-11373
>                 URL: https://issues.apache.org/jira/browse/SPARK-11373
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Steve Loughran
>
> The History server doesn't publish metrics about JVM load or anything from 
> the history provider plugins. This means that performance problems from 
> massive job histories aren't visible to management tools, and nor are any 
> provider-generated metrics such as time to load histories, failed history 
> loads, the number of connectivity failures talking to remote services, etc.
> If the history server set up a metrics registry and offered the option to 
> publish its metrics, then management tools could view this data.
> # the metrics registry would need to be passed down to the instantiated 
> {{ApplicationHistoryProvider}}, in order for it to register its metrics.
> # if the codahale metrics servlet were registered under a path such as 
> {{/metrics}}, the values would be visible as HTML and JSON, without the need 
> for management tools.
> # Integration tests could also retrieve the JSON-formatted data and use it as 
> part of the test suites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to