GitHub user attilapiros opened a pull request:

    https://github.com/apache/spark/pull/21635

    [SPARK-24594][YARN] Introducing metrics for YARN executor allocation 
problems

    ## What changes were proposed in this pull request?
    
    In this PR metrics are introduced for YARN allocation failures.  As up to 
now there was no metrics in the YARN module a new metric system is created with 
the name "yarn".
    To support both client and cluster mode the metric system lifecycle is 
bound to the AM. 
    
    ## How was this patch tested?
    
    Both client and cluster mode was tested manually. 
    Before the test on one of the YARN node spark-core was removed to cause the 
allocation failure. 
    Spark was started as (in case of client mode): 
    
    ```
    spark2-submit \
      --class org.apache.spark.examples.SparkPi \
      --conf "spark.yarn.blacklist.executor.launch.blacklisting.enabled=true" 
--conf "spark.blacklist.application.maxFailedExecutorsPerNode=2" --conf 
"spark.dynamicAllocation.enabled=true" --conf 
"spark.metrics.conf.*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink"
 \
      --master yarn \
      --deploy-mode client \
      original-spark-examples_2.11-2.4.0-SNAPSHOT.jar \
      1000
    ```
    
    In both cases the YARN logs contained the new metrics as:
    
    ```
    $ yarn logs --applicationId application_1529926424933_0015 | grep -A1 -B1 
yarn.numFailedExecutors
    18/06/25 07:08:29 INFO client.RMProxy: Connecting to ResourceManager at ...
    -- Gauges 
----------------------------------------------------------------------
    yarn.numFailedExecutors
                 value = 0
    --
    -- Gauges 
----------------------------------------------------------------------
    yarn.numFailedExecutors
                 value = 3
    --
    -- Gauges 
----------------------------------------------------------------------
    yarn.numFailedExecutors
                 value = 3
    --
    -- Gauges 
----------------------------------------------------------------------
    yarn.numFailedExecutors
                 value = 3
    ``` 
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/attilapiros/spark SPARK-24594

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21635.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21635
    
----
commit 9b033ccfa572c93d7c2dc7bca06f9be1e363f88a
Author: “attilapiros” <piros.attila.zsolt@...>
Date:   2018-06-19T19:40:20Z

    Initial commit (yarn metrics)

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to