[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361442797
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   I believe this would work. In `AsyncEventQueue`, in fact, there is also a 
`stoped` status that we could check. 
   But associating a listener with its `AsyncEventQueue` would be another 
problem we have to resolve.  Currently, it's encapsulated by 
`bus.addToXXXQueue` inside the bus code.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361442797
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   I believe this would work. In `AsyncEventQueue`, in fact, there is also a 
`stoped` status that we could check. 
   But associating listener with its `AsyncEventQueue` would be another problem 
we have to resolve.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361442797
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   I believe this would work. In `AsyncEventQueue`, in fact, there is also a 
`stoped` status that we could check.
   It sounds fine to me. cc @vanzin 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361422721
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Got your point.
   
   Now, there are two things.
   
   1. Without the fix, how the test would behave.
   2. With the fix,  how to make sure that there is no deadlock when a listener 
is interrupted after bus.stop is called.
   
   For (1),  we can't avoid racing without changing the `bus.stop` code (e.g. 
add a callback).
   For (2),  we at least have to expose the internal `stoped` status of `bus`, 
which maybe is not recommended. 
   
   So WDYT? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361422721
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Got your point.
   
   Now, there are two things.
   
   1. Without the fix, what test would behave.
   2. With the fix,  how to make sure that there is no deadlock when a listener 
is interrupted after bus.stop is called.
   
   For (1),  we can't avoid racing without changing the `bus.stop` code (e.g. 
add a callback).
   For (2),  we at least have to expose the internal `stoped` status of `bus`, 
which maybe is not recommended. 
   
   So WDYT? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361417566
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   To make sure that there is no deadlock when a listener is interrupted after 
`bus.stop` is called. 
   At the same time, we don't introduce API changing or internal status (e.g. 
stoped status) exposed to `bus`.
   
   Any better idea?

   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361402976
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Unfortunately,  checking the  `stoped` status can't guarantee this.  It's 
likely that the bus has already set the `stoped` status to true, but has not 
acquired the synchronized lock yet.
   
   To avoid racing, we can pass a callback into `bus.stop`,  and notify the 
interrupting listener in the callback.
   ```scala 
 def stop(callback: Option[() => Unit]): Unit = {
   if (!started.get()) {
 throw new IllegalStateException(s"Attempted to stop bus that has not 
yet started!")
   }
   
   if (!stopped.compareAndSet(false, true)) {
 return
   }
   
   synchronized {
 callback.foreach { c => c() } 
 queues.asScala.foreach(_.stop())
 queues.clear()
   }
 }
   
   ...
   // in the test
   bus.stop(Some(() => interruptingListener.sleep = false))
   
   ```
   But that's weird. 
   1.  It changes the `bus.stop` API.
   2.  The synchronized in `bus.stop` has already been removed.  It's not 
desirable to do that for just testing an old bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361402976
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Unfortunately,  checking the  `stoped` status can't guarantee this.  It's 
likely that the bus has already set the `stoped` status to true, but has not 
acquired the synchronized lock yet.
   
   To avoid racing, we can pass a callback into `bus.stop`,  and notify the 
interrupting listener in the callback.
   ```scala 
 def stop(callback: Option[() => Unit]): Unit = {
   if (!started.get()) {
 throw new IllegalStateException(s"Attempted to stop bus that has not 
yet started!")
   }
   
   if (!stopped.compareAndSet(false, true)) {
 return
   }
   
   synchronized {
 callback.foreach{ c => c()} 
 queues.asScala.foreach(_.stop())
 queues.clear()
   }
 }
   
   ...
   // in the test
   bus.stop(Some(() => interruptingListener.sleep = false))
   
   ```
   But that's weird. 
   1.  It changes the `bus.stop` API.
   2.  The synchronized in `bus.stop` has already been removed.  It's not 
desirable to do that for just testing an old bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361402976
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Unfortunately,  checking the  `stoped` status can't guarantee this.  It's 
likely that the bus has already set the `stoped` status to true, but has not 
entring the synchronized lock yet.
   
   To avoid racing, we can pass a callback into `bus.stop`,  and notify the 
interrupting listener in the callback.
   ```scala 
 def stop(callback: Option[() => Unit]): Unit = {
   if (!started.get()) {
 throw new IllegalStateException(s"Attempted to stop bus that has not 
yet started!")
   }
   
   if (!stopped.compareAndSet(false, true)) {
 return
   }
   
   synchronized {
 callback.foreach{ c => c()} 
 queues.asScala.foreach(_.stop())
 queues.clear()
   }
 }
   
   ...
   // in the test
   bus.stop(Some(() => interruptingListener.sleep = false))
   
   ```
   But that's weird. 
   1.  It changes the `bus.stop` API.
   2.  The synchronized in `bus.stop` has already been removed.  It's not 
desirable to do that for just testing an old bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361402976
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Unfortunately,  checking the  `stoped` status can't guarantee this.  It's 
likely that the bus has already set the `stoped` status to true, but has not 
entring the synchronized the lock yet.
   
   To avoid racing, we can pass a callback into `bus.stop`,  and notify the 
interrupting listener in the callback.
   ```scala 
 def stop(callback: Option[() => Unit]): Unit = {
   if (!started.get()) {
 throw new IllegalStateException(s"Attempted to stop bus that has not 
yet started!")
   }
   
   if (!stopped.compareAndSet(false, true)) {
 return
   }
   
   synchronized {
 callback.foreach{ c => c()} 
 queues.asScala.foreach(_.stop())
 queues.clear()
   }
 }
   
   ...
   // in the test
   bus.stop(Some(() => interruptingListener.sleep = false))
   
   ```
   But that's weird. 
   1.  It changes the `bus.stop` API.
   2.  The synchronized in `bus.stop` has already been removed.  It's not 
desirable to do that for just testing an old bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-26 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361396535
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   I don't think so. `bus.stop` will block until all the underlying queues and 
listeners have been cleaned up and removed. 
   
   > I don't think there's a way to write a proper test here without changing a 
bunch of things in the bus and queue code to expose internal hooks... and I 
don't think that's desirable.
   
   The current test is just a trade-off. That said, it doesn't intrude into the 
bus code.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-25 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r361388539
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   As the PR description, to reproduce the original issue, we have to make sure:
   
   1.  Holding the  synchronized lock of `bus` in the stopping thread
   2.  Trying to acquire the synchronized lock of `bus` in the interrupting 
listener thread
   
   But signal the listener starts to interrupt just before `bus.stop` by a 
`CountDownLatch` can't guarantee this 100%, right?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-20 Thread GitBox
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] 
Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r360633022
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
 ##
 @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val conf = new SparkConf(false)
+.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5)
+  val bus = new LiveListenerBus(conf)
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(() => {
+bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
 
 Review comment:
   Maybe we could check the `stopped` status of `bus` in the listener. 
   This would be better than using a `CountDownLatch`, however, it can't get 
rid of racing completely. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE]Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2019-12-17 Thread GitBox
wangshuo128 commented on a change in pull request #26924: 
[SPARK-30285][CORE]Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError
URL: https://github.com/apache/spark/pull/26924#discussion_r359193483
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala
 ##
 @@ -201,10 +201,24 @@ private class AsyncEventQueue(
 true
   }
 
+  override def doPostEvent(listener: SparkListenerInterface, event: 
SparkListenerEvent): Unit = {
+// If listener is dead, we don't post any event to it.
+if (!listener.dead) {
+  super.doPostEvent(listener, event)
+}
+  }
+
   override def removeListenerOnError(listener: SparkListenerInterface): Unit = 
{
-// the listener failed in an unrecoverably way, we want to remove it from 
the entire
-// LiveListenerBus (potentially stopping a queue if it is empty)
-bus.removeListener(listener)
+if (bus.isInStop) {
+  // If bus is in the progress of stop, we just mark the listener as dead 
instead of removing
+  // via calling `bus.removeListener` to avoid race condition
+  // dead listeners will be removed eventually in `bus.stop`
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org