GitHub user mareksimunek opened a pull request:
https://github.com/apache/spark/pull/22485
[SPARK-18364][YARN] Expose metrics for YarnShuffleService
## What changes were proposed in this pull request?
This PR is follow-up of closed https://github.com/apache/spark/pull/17401
which only ended due to of inactivity, but its still nice feature to have.
Given review by @jerryshao taken in consideration and edited:
- @VisibleForTesting deleted because of dependency conflicts
- removed unnecessary reflection for `MetricsSystemImpl`
- added more available types for gauge
## How was this patch tested?
Manual deploy of new yarn-shuffle jar into a Node Manager and verifying
that the metrics appear in the Node Manager-standard location. This is JMX with
an query endpoint running on `hostname:port`
Resulting metrics look like this:
```
curl -sk -XGET hostname:port | grep -v '#' | grep 'shuffleService'
hadoop_nodemanager_openblockrequestlatencymillis_rate15{name="shuffleService",}
0.31428910657834713
hadoop_nodemanager_blocktransferratebytes_rate15{name="shuffleService",}
566144.9983653595
hadoop_nodemanager_blocktransferratebytes_ratemean{name="shuffleService",}
2464409.9678099006
hadoop_nodemanager_openblockrequestlatencymillis_rate1{name="shuffleService",}
1.2893844732240272
hadoop_nodemanager_registeredexecutorssize{name="shuffleService",} 2.0
hadoop_nodemanager_openblockrequestlatencymillis_ratemean{name="shuffleService",}
1.255574678369966
hadoop_nodemanager_openblockrequestlatencymillis_count{name="shuffleService",}
315.0
hadoop_nodemanager_openblockrequestlatencymillis_rate5{name="shuffleService",}
0.7661929192569739
hadoop_nodemanager_registerexecutorrequestlatencymillis_ratemean{name="shuffleService",}
0.0
hadoop_nodemanager_registerexecutorrequestlatencymillis_count{name="shuffleService",}
0.0
hadoop_nodemanager_registerexecutorrequestlatencymillis_rate1{name="shuffleService",}
0.0
hadoop_nodemanager_registerexecutorrequestlatencymillis_rate5{name="shuffleService",}
0.0
hadoop_nodemanager_blocktransferratebytes_count{name="shuffleService",}
6.18271213E8
hadoop_nodemanager_registerexecutorrequestlatencymillis_rate15{name="shuffleService",}
0.0
hadoop_nodemanager_blocktransferratebytes_rate5{name="shuffleService",}
1154114.4881816586
hadoop_nodemanager_blocktransferratebytes_rate1{name="shuffleService",}
574745.0749848988
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mareksimunek/spark SPARK-18364
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22485.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22485
----
commit 5d9434bbb2fa650e987bc2e68d183aea691f9ac5
Author: Andrew Ash <andrew@...>
Date: 2017-03-23T02:59:38Z
[SPARK-18364][YARN] Expose metrics for YarnShuffleService
Registers the shuffle server's metrics with the Hadoop Node Manager's
DefaultMetricsSystem.
Test metric collector gets right converted calls
camel-case shuffleService
Pass scalastyle
Reformat and organize imports
With import order specified at http://spark.apache.org/contributing.html
commit 6c96397536af57a8bbe8dd2529547427f643512b
Author: marek.simunek <marek.simunek@...>
Date: 2018-09-19T15:17:53Z
[SPARK-18364][YARN] YarnShuffleService metrics correction
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]