[GitHub] spark issue #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers race again...
Github user jxiang commented on the issue: https://github.com/apache/spark/pull/17091 Oh, sorry, I closed it by mistake. I'd like to get it in. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...
GitHub user jxiang reopened a pull request: https://github.com/apache/spark/pull/17091 [SPARK-19757][CORE] DriverEndpoint#makeOffers race against CoarseGrainedSchedulerBackend#killExecutors ## What changes were proposed in this pull request? While some executors are being killed due to idleness, if some new tasks come in, driver could assign them to some executors are being killed. These tasks will fail later when the executors are lost. This patch is to make sure CoarseGrainedSchedulerBackend#killExecutors and DriverEndpoint#makeOffers are properly synchronized. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jxiang/spark spark-19757 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17091.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17091 commit 9ade0fd49d023b8f3d12e15b47c5fdd513e3a7db Author: Jimmy Xiang Date: 2017-02-28T00:27:45Z [SPARK-19757][CORE] Executor with task scheduled could be killed due to idleness --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...
Github user jxiang closed the pull request at: https://github.com/apache/spark/pull/17091 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...
Github user jxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/17091#discussion_r104726794 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1646,7 +1646,7 @@ class SparkContext(config: SparkConf) extends Logging { def killExecutors(executorIds: Seq[String]): Boolean = { schedulerBackend match { case b: CoarseGrainedSchedulerBackend => -b.killExecutors(executorIds, replace = false, force = true).nonEmpty +b.killExecutors(executorIds, replace = false, force = false).nonEmpty --- End diff -- Just found out that SPARK-16554 just fixed the issue of killing busy executors. Reverted this change and updated the patch. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...
Github user jxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/17091#discussion_r104578056 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1646,7 +1646,7 @@ class SparkContext(config: SparkConf) extends Logging { def killExecutors(executorIds: Seq[String]): Boolean = { schedulerBackend match { case b: CoarseGrainedSchedulerBackend => -b.killExecutors(executorIds, replace = false, force = true).nonEmpty +b.killExecutors(executorIds, replace = false, force = false).nonEmpty --- End diff -- Good point. I thought about to add a new API, but it needs quite some changes. The issue is that in ExecutorAllocationManager#removeExecutors, we need a way to tell client.killExecutors(executorIdsToBeRemoved) to skip those busy executors. We can either add one new API to SparkContext, or call CoarseGrainedSchedulerBackend instead SparkContext in ExecutorAllocationManager. What do you suggest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers race again...
Github user jxiang commented on the issue: https://github.com/apache/spark/pull/17091 @vanzin I updated the patch a little so as not to kill an executor if it is busy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17091: [SPARK-19757] DriverEndpoint#makeOffers race against Coa...
Github user jxiang commented on the issue: https://github.com/apache/spark/pull/17091 Sure. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: DriverEndpoint#makeOffers race against CoarseGrai...
Github user jxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/17091#discussion_r104214804 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -222,12 +222,17 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp // Make fake resource offers on all executors private def makeOffers() { - // Filter out executors under killing - val activeExecutors = executorDataMap.filterKeys(executorIsAlive) - val workOffers = activeExecutors.map { case (id, executorData) => -new WorkerOffer(id, executorData.executorHost, executorData.freeCores) - }.toIndexedSeq - launchTasks(scheduler.resourceOffers(workOffers)) + val taskDescs = synchronized { --- End diff -- Good question. Updated the description and the title, and added comment to mention the race. Fixed the synchronized statement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17091: [SPARK-19757][CORE] Executor with task scheduled ...
GitHub user jxiang opened a pull request: https://github.com/apache/spark/pull/17091 [SPARK-19757][CORE] Executor with task scheduled could be killed due to idleness ## What changes were proposed in this pull request? In makeOffers, put in one synchronization block to check if an executor is alive, and schedule a task to it. So that the executor won't be killed in the middle (Please fill in changes proposed in this fix) ## How was this patch tested? manual tests (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jxiang/spark spark-19757 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17091.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17091 commit 6b57d7b0d7ffb511e0348ef3bdcf6f1061225984 Author: Jimmy Xiang Date: 2017-02-28T00:27:45Z [CORE] Executor with task scheduled could be killed due to idleness --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org