Hi Han,

You may be seeing the same issue I described here:
https://issues.apache.org/jira/browse/SPARK-22342?focusedCommentId=16411780&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16411780
Do you see "TASK_LOST" in your driver logs? I got past that issue by
updating my version of libmesos (see my second comment in the ticket).

There's also this PR that is in progress:
https://github.com/apache/spark/pull/20640

Susan

On Sun, Apr 8, 2018 at 4:06 PM, hantuzun <m...@hantuzun.com> wrote:

> Hi all,
>
> Spark currently has blacklisting enabled on Mesos, no matter what:
> [SPARK-19755][Mesos] Blacklist is always active for
> MesosCoarseGrainedSchedulerBackend
>
> Blacklisting also prevents new drivers from running on our nodes where
> previous drivers' had failed tasks.
>
> We've tried restarting Spark dispatcher before sending new tasks. Even
> creating new machines (with the same hostname) does not help.
>
> Looking at  TaskSetBlacklist
> <https://github.com/apache/spark/blob/e18d6f5326e0d9ea03d31de5ce04cb
> 84d3b8ab37/core/src/main/scala/org/apache/spark/
> scheduler/TaskSetBlacklist.scala#L66>
> , I don't understand how a fresh Spark job submitted from a fresh Spark
> Dispatcher starts saying all the nodes are blacklisted right away. How does
> Spark know previous task failures?
>
> This issue severely interrupts us. How could we disable blacklisting on
> Spark 2.3.0? Creative ideas are welcome :)
>
> Best,
> Han
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Susan X. Huynh
Software engineer, Data Agility
xhu...@mesosphere.com

Reply via email to