[GitHub] spark issue #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers race again...

2017-03-07 Thread jxiang
Github user jxiang commented on the issue:

https://github.com/apache/spark/pull/17091
  
Oh, sorry, I closed it by mistake. I'd like to get it in. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...

2017-03-07 Thread jxiang
GitHub user jxiang reopened a pull request:

https://github.com/apache/spark/pull/17091

[SPARK-19757][CORE] DriverEndpoint#makeOffers race against 
CoarseGrainedSchedulerBackend#killExecutors

## What changes were proposed in this pull request?
While some executors are being killed due to idleness, if some new tasks 
come in, driver could assign them to some executors are being killed. These 
tasks will fail later when the executors are lost. This patch is to make sure 
CoarseGrainedSchedulerBackend#killExecutors and DriverEndpoint#makeOffers are 
properly synchronized.

## How was this patch tested?
manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jxiang/spark spark-19757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17091


commit 9ade0fd49d023b8f3d12e15b47c5fdd513e3a7db
Author: Jimmy Xiang 
Date:   2017-02-28T00:27:45Z

[SPARK-19757][CORE] Executor with task scheduled could be killed due to 
idleness




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...

2017-03-07 Thread jxiang
Github user jxiang closed the pull request at:

https://github.com/apache/spark/pull/17091


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...

2017-03-07 Thread jxiang
Github user jxiang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17091#discussion_r104726794
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1646,7 +1646,7 @@ class SparkContext(config: SparkConf) extends Logging 
{
   def killExecutors(executorIds: Seq[String]): Boolean = {
 schedulerBackend match {
   case b: CoarseGrainedSchedulerBackend =>
-b.killExecutors(executorIds, replace = false, force = 
true).nonEmpty
+b.killExecutors(executorIds, replace = false, force = 
false).nonEmpty
--- End diff --

Just found out that SPARK-16554 just fixed the issue of killing busy 
executors. Reverted this change and updated the patch. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers rac...

2017-03-06 Thread jxiang
Github user jxiang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17091#discussion_r104578056
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1646,7 +1646,7 @@ class SparkContext(config: SparkConf) extends Logging 
{
   def killExecutors(executorIds: Seq[String]): Boolean = {
 schedulerBackend match {
   case b: CoarseGrainedSchedulerBackend =>
-b.killExecutors(executorIds, replace = false, force = 
true).nonEmpty
+b.killExecutors(executorIds, replace = false, force = 
false).nonEmpty
--- End diff --

Good point. I thought about to add a new API, but it needs quite some 
changes. The issue is that in ExecutorAllocationManager#removeExecutors, we 
need a way to tell client.killExecutors(executorIdsToBeRemoved) to skip those 
busy executors. We can either add one new API to SparkContext, or call 
CoarseGrainedSchedulerBackend instead SparkContext in 
ExecutorAllocationManager. What do you suggest?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17091: [SPARK-19757][CORE] DriverEndpoint#makeOffers race again...

2017-03-06 Thread jxiang
Github user jxiang commented on the issue:

https://github.com/apache/spark/pull/17091
  
@vanzin I updated the patch a little so as not to kill an executor if it is 
busy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17091: [SPARK-19757] DriverEndpoint#makeOffers race against Coa...

2017-03-05 Thread jxiang
Github user jxiang commented on the issue:

https://github.com/apache/spark/pull/17091
  
Sure. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: DriverEndpoint#makeOffers race against CoarseGrai...

2017-03-03 Thread jxiang
Github user jxiang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17091#discussion_r104214804
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -222,12 +222,17 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 
 // Make fake resource offers on all executors
 private def makeOffers() {
-  // Filter out executors under killing
-  val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
-  val workOffers = activeExecutors.map { case (id, executorData) =>
-new WorkerOffer(id, executorData.executorHost, 
executorData.freeCores)
-  }.toIndexedSeq
-  launchTasks(scheduler.resourceOffers(workOffers))
+  val taskDescs = synchronized {
--- End diff --

Good question. Updated the description and the title, and added comment to 
mention the race.
Fixed the synchronized statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17091: [SPARK-19757][CORE] Executor with task scheduled ...

2017-02-27 Thread jxiang
GitHub user jxiang opened a pull request:

https://github.com/apache/spark/pull/17091

[SPARK-19757][CORE] Executor with task scheduled could be killed due to 
idleness

## What changes were proposed in this pull request?
In makeOffers, put in one synchronization block to check if
an executor is alive, and schedule a task to it. So that the
executor won't be killed in the middle

(Please fill in changes proposed in this fix)

## How was this patch tested?
manual tests
(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jxiang/spark spark-19757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17091


commit 6b57d7b0d7ffb511e0348ef3bdcf6f1061225984
Author: Jimmy Xiang 
Date:   2017-02-28T00:27:45Z

[CORE] Executor with task scheduled could be killed due to idleness




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org