[ 
https://issues.apache.org/jira/browse/SPARK-26688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey updated SPARK-26688:
---------------------------
    Comment: was deleted

(was: Hi There!

I'm very glad that the community paid attention to my question. Let me try to 
explain usecase

There is 1K nodes cluster and jobs have performance degradation because of a 
single node. It's rather hard to convince Cluster Ops to decommission node 
because of "performance degradation". Imagine 10 dev teams chase single ops 
team for valid reason (node has problems) or because code has a bug or data is 
skewed or spots on the sun. We can't just decommission node because random dev 
complains. 

Simple solution:
 * rerun failed / delayed job and blacklist "problematic" node in advance.
 * Report about the problem if job works w/o anomalies. 
 * ops collect complains about node and start to decommission it when 
"complains threshold" is reached. It's a rather low probability that many 
loosely coupled teams with loosely coupled jobs complain about a single node. 

Results
 * Ops are not spammed with a random requests from devs
 * devs are not blocked because of the really bad node.
 * it's very cheap for everyone to "blacklist" node during job submission w/o 
doing anything to node. )

> Provide configuration of initially blacklisted YARN nodes
> ---------------------------------------------------------
>
>                 Key: SPARK-26688
>                 URL: https://issues.apache.org/jira/browse/SPARK-26688
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 3.0.0
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Introducing new config for initially blacklisted YARN nodes.
> This came up in the apache spark user mailing list: 
> [http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Yarn-is-it-possible-to-manually-blacklist-nodes-before-running-spark-job-td34395.html]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to