[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18427


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-07 Thread ericvandenbergfb
Github user ericvandenbergfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r126236401
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
+
+// Create a task with two executors.
+sc = new SparkContext("local", "test", conf)
+val exec = "executor1"
+val host = "host1"
+val exec2 = "executor2"
+val host2 = "host2"
+sched = new FakeTaskScheduler(sc, (exec, host), (exec2, host2))
+val taskSet = FakeTask.createTaskSet(1)
+
+val clock = new ManualClock
+val mockListenerBus = mock(classOf[LiveListenerBus])
+val blacklistTracker = new BlacklistTracker(mockListenerBus, conf, 
None, clock)
+val taskSetManager = new TaskSetManager(sched, taskSet, 1, 
Some(blacklistTracker))
+val taskSetManagerSpy = spy(taskSetManager)
+
+val taskDesc = taskSetManagerSpy.resourceOffer(exec, host, 
TaskLocality.ANY)
+
+// Assert the task has been black listed on the executor it was last 
executed on.
+when(taskSetManagerSpy.addPendingTask(anyInt())).thenAnswer(
+  new Answer[Unit] {
+override def answer(invocationOnMock: InvocationOnMock): Unit = {
+  val task = invocationOnMock.getArgumentAt(0, classOf[Int])
+  assert(taskSetManager.taskSetBlacklistHelperOpt.get.
+isExecutorBlacklistedForTask(exec, task))
+}
+  }
+)
+
+// Simulate an out of memory error
+val e = new OutOfMemoryError
+taskSetManagerSpy.handleFailedTask(
+  taskDesc.get.taskId, TaskState.FAILED, new ExceptionFailure(e, 
Seq()))
--- End diff --

Okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-07 Thread ericvandenbergfb
Github user ericvandenbergfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r126228623
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
+
+// Create a task with two executors.
+sc = new SparkContext("local", "test", conf)
+val exec = "executor1"
+val host = "host1"
+val exec2 = "executor2"
+val host2 = "host2"
+sched = new FakeTaskScheduler(sc, (exec, host), (exec2, host2))
+val taskSet = FakeTask.createTaskSet(1)
+
+val clock = new ManualClock
+val mockListenerBus = mock(classOf[LiveListenerBus])
+val blacklistTracker = new BlacklistTracker(mockListenerBus, conf, 
None, clock)
+val taskSetManager = new TaskSetManager(sched, taskSet, 1, 
Some(blacklistTracker))
--- End diff --

It seems all the tests in this file are using ManualClock so was following 
convention here.  This test doesn't validate anything specifically dependent on 
the clock/time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-07 Thread ericvandenbergfb
Github user ericvandenbergfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r126228249
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
--- End diff --

Yes, I added it to make the test code (configuration) inputs more explicit, 
but I can remove if it's a default unlikely to change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r125941148
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
--- End diff --

The default value of `config.MAX_TASK_ATTEMPTS_PER_EXECUTOR` is 1, so we 
don't have to set it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r125952506
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
+
+// Create a task with two executors.
+sc = new SparkContext("local", "test", conf)
+val exec = "executor1"
+val host = "host1"
+val exec2 = "executor2"
+val host2 = "host2"
+sched = new FakeTaskScheduler(sc, (exec, host), (exec2, host2))
+val taskSet = FakeTask.createTaskSet(1)
+
+val clock = new ManualClock
+val mockListenerBus = mock(classOf[LiveListenerBus])
+val blacklistTracker = new BlacklistTracker(mockListenerBus, conf, 
None, clock)
+val taskSetManager = new TaskSetManager(sched, taskSet, 1, 
Some(blacklistTracker))
+val taskSetManagerSpy = spy(taskSetManager)
+
+val taskDesc = taskSetManagerSpy.resourceOffer(exec, host, 
TaskLocality.ANY)
+
+// Assert the task has been black listed on the executor it was last 
executed on.
+when(taskSetManagerSpy.addPendingTask(anyInt())).thenAnswer(
+  new Answer[Unit] {
+override def answer(invocationOnMock: InvocationOnMock): Unit = {
+  val task = invocationOnMock.getArgumentAt(0, classOf[Int])
+  assert(taskSetManager.taskSetBlacklistHelperOpt.get.
+isExecutorBlacklistedForTask(exec, task))
+}
+  }
+)
+
+// Simulate an out of memory error
+val e = new OutOfMemoryError
+taskSetManagerSpy.handleFailedTask(
+  taskDesc.get.taskId, TaskState.FAILED, new ExceptionFailure(e, 
Seq()))
--- End diff --

nit: `ExceptionFailure` is a case class, so you may use:
```
val e = ExceptionFailure("a", "b", Array(), "c", None)
taskSetManagerSpy.handleFailedTask(taskDesc.get.taskId, TaskState.FAILED, 
endReason)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18427: [SPARK-21219][Core] Task retry occurs on same exe...

2017-07-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18427#discussion_r125950927
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -1172,6 +1172,50 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(blacklistTracker.isNodeBlacklisted("host1"))
   }
 
+  test("update blacklist before adding pending task to avoid race 
condition") {
+// When a task fails, it should apply the blacklist policy prior to
+// retrying the task otherwise there's a race condition where run on
+// the same executor that it was intended to be black listed from.
+val conf = new SparkConf().
+  set(config.BLACKLIST_ENABLED, true).
+  set(config.MAX_TASK_ATTEMPTS_PER_EXECUTOR, 1)
+
+// Create a task with two executors.
+sc = new SparkContext("local", "test", conf)
+val exec = "executor1"
+val host = "host1"
+val exec2 = "executor2"
+val host2 = "host2"
+sched = new FakeTaskScheduler(sc, (exec, host), (exec2, host2))
+val taskSet = FakeTask.createTaskSet(1)
+
+val clock = new ManualClock
+val mockListenerBus = mock(classOf[LiveListenerBus])
+val blacklistTracker = new BlacklistTracker(mockListenerBus, conf, 
None, clock)
+val taskSetManager = new TaskSetManager(sched, taskSet, 1, 
Some(blacklistTracker))
--- End diff --

Why are we using `SystemClock` for `taskSetManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org