[ 
https://issues.apache.org/jira/browse/SPARK-21829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144889#comment-16144889
 ] 

Saisai Shao commented on SPARK-21829:
-------------------------------------

I understand you concern, since it is much easier for you to change Spark 
rather than YARN which is shared across the teams. But my thinking is more from 
the feature itself, looks like such kind of thing should be handled by cluster 
manager, not Spark itself. At least it should be done by {{yarn#client}} and 
{{ApplicationMaster}}, not blacklist.

I think [~irashid] and [~tgraves] may have other thoughts.

> Enable config to permanently blacklist a list of nodes
> ------------------------------------------------------
>
>                 Key: SPARK-21829
>                 URL: https://issues.apache.org/jira/browse/SPARK-21829
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler, Spark Core
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Luca Canali
>            Priority: Minor
>
> The idea for this proposal comes from a performance incident in a local 
> cluster where a job was found very slow because of a log tail of stragglers 
> due to 2 nodes in the cluster being slow to access a remote filesystem.
> The issue was limited to the 2 machines and was related to external 
> configurations: the 2 machines that performed badly when accessing the remote 
> file system were behaving normally for other jobs in the cluster (a shared 
> YARN cluster).
> With this new feature I propose to introduce a mechanism to allow users to 
> specify a list of nodes in the cluster where executors/tasks should not run 
> for a specific job.
> The proposed implementation that I tested (see PR) uses the Spark blacklist 
> mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list 
> of user-specified nodes is added to the blacklist at the start of the Spark 
> Context and it is never expired. 
> I have tested this on a YARN cluster on a case taken from the original 
> production problem and I confirm a performance improvement of about 5x for 
> the specific test case I have. I imagine that there can be other cases where 
> Spark users may want to blacklist a set of nodes. This can be used for 
> troubleshooting, including cases where certain nodes/executors are slow for a 
> given workload and this is caused by external agents, so the anomaly is not 
> picked up by the cluster manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to