Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8945#discussion_r41775279 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -369,6 +369,38 @@ class StandaloneDynamicAllocationSuite assert(apps.head.getExecutorLimit === 1) } + test("the pending replacement executors should not be lost (SPARK-10515)") { + sc = new SparkContext(appConf) + val appId = sc.applicationId + eventually(timeout(10.seconds), interval(10.millis)) { + val apps = getApplications() + assert(apps.size === 1) + assert(apps.head.id === appId) + assert(apps.head.executors.size === 2) + assert(apps.head.getExecutorLimit === Int.MaxValue) + } + // sync executors between the Master and the driver, needed because + // the driver refuses to kill executors it does not know about + syncExecutors(sc) + val executors = getExecutorIds(sc) + assert(executors.size === 2) + + // kill executor 1, and replace it + assert(sc.killAndReplaceExecutor(executors.head)) + var apps = getApplications() + assert(apps.head.executors.size === 2) --- End diff -- I understand that. But what I'm saying is, in the master's view of the world, what happens is that one executor goes away and a new executor comes up. So there may be a window in which the executor count, as seen from the master, is "1", temporarily, before the new executor starts. And that would cause your test to fail sporadically.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org