[ 
https://issues.apache.org/jira/browse/SPARK-19941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923750#comment-15923750
 ] 

Sean Owen commented on SPARK-19941:
-----------------------------------

In this scenario YARN is not trying to preempt applications, right? then it 
should wait for the executor to finish. I don't see that this state means the 
app should stop. That is, the point of a decommissioning state is to not just 
tell apps they need to stop.

It makes a bit more sense for dynamic allocation, because the app has some 
permission (sometimes) to stop an executor and restart one. But there too if 
Spark is using an executor then the decommissioning NM can wait for it to be 
done.

It does make sense for the driver to somehow not schedule work on an executor 
that is going to be shut down by YARN, if it can get a heads up.

This sounds like something that only would work in Hadoop 2.8, so would not be 
possible to use without some reflection or until 2.8 is required here.

> Spark should not schedule tasks on executors on decommissioning YARN nodes
> --------------------------------------------------------------------------
>
>                 Key: SPARK-19941
>                 URL: https://issues.apache.org/jira/browse/SPARK-19941
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, YARN
>    Affects Versions: 2.1.0
>         Environment: Hadoop 2.8.0-rc1
>            Reporter: Karthik Palaniappan
>
> Hadoop 2.8 added a mechanism to gracefully decommission Node Managers in 
> YARN: https://issues.apache.org/jira/browse/YARN-914
> Essentially you can mark nodes to be decommissioned, and let them a) finish 
> work in progress and b) finish serving shuffle data. But no new work will be 
> scheduled on the node.
> Spark should respect when NMs are set to decommissioned, and similarly 
> decommission executors on those nodes by not scheduling any more tasks on 
> them.
> It looks like in the future YARN may inform the app master when containers 
> will be killed: https://issues.apache.org/jira/browse/YARN-3784. However, I 
> don't think Spark should schedule based on a timeout. We should gracefully 
> decommission the executor as fast as possible (which is the spirit of 
> YARN-914). The app master can query the RM for NM statuses (if it doesn't 
> already have them) and stop scheduling on executors on NMs that are 
> decommissioning.
> Stretch feature: The timeout may be useful in determining whether running 
> further tasks on the executor is even helpful. Spark may be able to tell that 
> shuffle data will not be consumed by the time the node is decommissioned, so 
> it is not worth computing. The executor can be killed immediately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to