Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20640#discussion_r196997885 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -648,14 +645,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( totalGpusAcquired -= gpus gpusByTaskId -= taskId } - // If it was a failure, mark the slave as failed for blacklisting purposes if (TaskState.isFailed(state)) { - slave.taskFailures += 1 - - if (slave.taskFailures >= MAX_SLAVE_FAILURES) { - logInfo(s"Blacklisting Mesos slave $slaveId due to too many failures; " + - "is Spark installed on it?") - } + logError(s"Task $taskId failed on Mesos slave $slaveId.") --- End diff -- @IgorBerman I'm not entirely sure what you mean. yes, *eventually* I think mesos should be doing something very simliar to whats in that PR. You can't use that immediately, because for now the other PR is tied to yarn internals. But I don't think it would be too hard to refactor what's there just a little bit so most of the logic could be reused. but I think everybody just wants to get this change in, and do that in a followup.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org