Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14710#discussion_r76830698 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -478,19 +487,24 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp numExecutors: Int, localityAwareTasks: Int, hostToLocalTaskCount: Map[String, Int] - ): Boolean = synchronized { + ): Boolean = { if (numExecutors < 0) { throw new IllegalArgumentException( "Attempted to request a negative number of executor(s) " + s"$numExecutors from the cluster manager. Please specify a positive number!") } - this.localityAwareTasks = localityAwareTasks - this.hostToLocalTaskCount = hostToLocalTaskCount + val response = synchronized { + this.localityAwareTasks = localityAwareTasks + this.hostToLocalTaskCount = hostToLocalTaskCount + + numPendingExecutors = --- End diff -- In this particular case, it's not that `ask` would be better, it's just that it would be no worse. With the new RPC code, the only time `askWithRetry` will actually retry, barring bugs in the RPC handlers, is when a timeout occurs, since the RPC layer does not drop messages. So an `ask` with a longer timeout has actually a better chance of succeeding, since with `askWithRetry` the remote end will receive and process the first message before the retries, even if the sender has given up on it. As for the bug you mention, yes it exists, but it also existed before.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org