Hi again, 

Ok, now I do not know of any way to fix the problem other then delete the
"bad" machine from the config + restart .. And you will need admin
privileges on cluster for that :(

However, before we give up on the speculative execution, I suspect that the
task is being run again and again on the same "faulty" machine because that
is where the data resides.
You could try to store / persist your RDD with MEMORY_ONLY_2 or
MEMORY_AND_DISK_2 as that will force the creation of a replica of the data
on another node. Thus, with two nodes, the scheduler may choose to execute
the speculative task on the second node (I'm not sure about his as I am just
not familiar enough with the Sparks scheduler priorities).
I'm not very hopeful but it may be worth a try (if you have the disk/RAM
space to be able to afford to duplicate all the data that is). 

If not, I am afraid I am out of ideas ;) 

Regards and good luck, 
    Gylfi. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to