[jira] [Commented] (SPARK-17667) Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest

Apache Spark (JIRA) Tue, 27 Sep 2016 12:19:10 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527128#comment-15527128
 ]


Apache Spark commented on SPARK-17667:
--------------------------------------

User 'ashwinshankar77' has created a pull request for this issue:
https://github.com/apache/spark/pull/15267

> Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17667
>                 URL: https://issues.apache.org/jira/browse/SPARK-17667
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Ashwin Shankar
>
> Following up on the discussion in SPARK-15725, one of the reason for AM 
> hanging with dynamic allocation(DA) is the way locking is done in 
> YarnAllocator. We noticed that when executors go down during the shrink phase 
> of DA, AM gets locked up. On taking thread dump, we see threads trying to get 
> loss for reason via YarnAllocator#enqueueGetLossReasonRequest, and they are 
> all BLOCKED waiting for lock acquired by allocate call. This gets worse when 
> the number of executors go down are in the thousands, and I've seen AM hang 
> in the order of minutes. This jira is created to make the locking little more 
> fine grained by remembering the executors that were killed via AM, and then 
> serve the GetExecutorLossReason requests with that information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17667) Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest

Reply via email to