GitHub user attilapiros opened a pull request: https://github.com/apache/spark/pull/21635
[SPARK-24594][YARN] Introducing metrics for YARN executor allocation problems ## What changes were proposed in this pull request? In this PR metrics are introduced for YARN allocation failures. As up to now there was no metrics in the YARN module a new metric system is created with the name "yarn". To support both client and cluster mode the metric system lifecycle is bound to the AM. ## How was this patch tested? Both client and cluster mode was tested manually. Before the test on one of the YARN node spark-core was removed to cause the allocation failure. Spark was started as (in case of client mode): ``` spark2-submit \ --class org.apache.spark.examples.SparkPi \ --conf "spark.yarn.blacklist.executor.launch.blacklisting.enabled=true" --conf "spark.blacklist.application.maxFailedExecutorsPerNode=2" --conf "spark.dynamicAllocation.enabled=true" --conf "spark.metrics.conf.*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink" \ --master yarn \ --deploy-mode client \ original-spark-examples_2.11-2.4.0-SNAPSHOT.jar \ 1000 ``` In both cases the YARN logs contained the new metrics as: ``` $ yarn logs --applicationId application_1529926424933_0015 | grep -A1 -B1 yarn.numFailedExecutors 18/06/25 07:08:29 INFO client.RMProxy: Connecting to ResourceManager at ... -- Gauges ---------------------------------------------------------------------- yarn.numFailedExecutors value = 0 -- -- Gauges ---------------------------------------------------------------------- yarn.numFailedExecutors value = 3 -- -- Gauges ---------------------------------------------------------------------- yarn.numFailedExecutors value = 3 -- -- Gauges ---------------------------------------------------------------------- yarn.numFailedExecutors value = 3 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/attilapiros/spark SPARK-24594 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21635.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21635 ---- commit 9b033ccfa572c93d7c2dc7bca06f9be1e363f88a Author: âattilapirosâ <piros.attila.zsolt@...> Date: 2018-06-19T19:40:20Z Initial commit (yarn metrics) ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org