[ https://issues.apache.org/jira/browse/SPARK-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das resolved SPARK-10125. ----------------------------------- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 1.5.0 > Fix a potential deadlock in JobGenerator.stop > --------------------------------------------- > > Key: SPARK-10125 > URL: https://issues.apache.org/jira/browse/SPARK-10125 > Project: Spark > Issue Type: Bug > Reporter: Shixiong Zhu > Assignee: Shixiong Zhu > Fix For: 1.5.0 > > > Because `lazy val` uses `this` lock, if JobGenerator.stop and > JobGenerator.doCheckpoint (JobGenerator.shouldCheckpoint has not yet been > initialized) run at the same time, it may hang. > Here are the stack traces for the deadlock: > {code} > "pool-1-thread-1-ScalaTest-running-StreamingListenerSuite" #11 prio=5 > os_prio=31 tid=0x00007fd35d094800 nid=0x5703 in Object.wait() > [0x000000012ecaf000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1245) > - locked <0x00000007b5d8d7f8> (a > org.apache.spark.util.EventLoop$$anon$1) > at java.lang.Thread.join(Thread.java:1319) > at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81) > at > org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:155) > - locked <0x00000007b5d8cea0> (a > org.apache.spark.streaming.scheduler.JobGenerator) > at > org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:95) > - locked <0x00000007b5d8ced8> (a > org.apache.spark.streaming.scheduler.JobScheduler) > at > org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:687) > "JobGenerator" #67 daemon prio=5 os_prio=31 tid=0x00007fd35c3b9800 nid=0x9f03 > waiting for monitor entry [0x0000000139e4a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint$lzycompute(JobGenerator.scala:63) > - waiting to lock <0x00000007b5d8cea0> (a > org.apache.spark.streaming.scheduler.JobGenerator) > at > org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint(JobGenerator.scala:63) > at > org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:290) > at > org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:83) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:82) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > I can use this patch to produce this deadlock: > https://github.com/zsxwing/spark/commit/8a88f28d1331003a65fabef48ae3d22a7c21f05f > And a timeout build in Jenkins due to this deadlock: > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1654/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org