[ https://issues.apache.org/jira/browse/SPARK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152774#comment-14152774 ]
Reynold Xin commented on SPARK-3709: ------------------------------------ Hanging driver stack trace {code} "pool-1-thread-1-ScalaTest-running-BroadcastSuite" prio=10 tid=0x00007f2114812000 nid=0xc8c in Object.wait() [0x00007f20bb8fd000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) - locked <0x00000007a2ff4bb8> (a org.apache.spark.scheduler.JobWaiter) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:512) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1087) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1104) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1118) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1132) at org.apache.spark.rdd.RDD.collect(RDD.scala:775) at org.apache.spark.broadcast.BroadcastSuite.testUnpersistBroadcast(BroadcastSuite.scala:291) at org.apache.spark.broadcast.BroadcastSuite.org$apache$spark$broadcast$BroadcastSuite$$testUnpersistTorrentBroadcast(BroadcastSuite.scala:232) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply$mcV$sp(BroadcastSuite.scala:112) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) at org.scalatest.Suite$class.withFixture(Suite.scala:1121) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) at org.apache.spark.broadcast.BroadcastSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(BroadcastSuite.scala:26) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.broadcast.BroadcastSuite.runTest(BroadcastSuite.scala:26) ... {code} Executor log {code} 14/09/29 20:35:57.254 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 14/09/29 20:35:57.502 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabl e 14/09/29 20:35:57.716 INFO SecurityManager: Changing view acls to: root 14/09/29 20:35:57.717 INFO SecurityManager: Changing modify acls to: root 14/09/29 20:35:57.717 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); u sers with modify permissions: Set(root) 14/09/29 20:35:58.096 INFO Slf4jLogger: Slf4jLogger started 14/09/29 20:35:58.136 INFO Remoting: Starting remoting 14/09/29 20:35:58.279 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@localhost:42339] 14/09/29 20:35:58.280 INFO Remoting: Remoting now listens on addresses: [akka.tcp://driverPropsFetcher@localhost:42339] 14/09/29 20:35:58.287 INFO Utils: Successfully started service 'driverPropsFetcher' on port 42339. 14/09/29 20:35:58.461 INFO SecurityManager: Changing view acls to: root 14/09/29 20:35:58.461 INFO SecurityManager: Changing modify acls to: root 14/09/29 20:35:58.462 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); u sers with modify permissions: Set(root) 14/09/29 20:35:58.466 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 14/09/29 20:35:58.467 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 14/09/29 20:35:58.493 INFO Slf4jLogger: Slf4jLogger started 14/09/29 20:35:58.499 INFO Remoting: Starting remoting 14/09/29 20:35:58.502 INFO Remoting: Remoting shut down 14/09/29 20:35:58.503 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 14/09/29 20:35:58.540 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@localhost:39122] 14/09/29 20:35:58.540 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@localhost:39122] 14/09/29 20:35:58.541 INFO Utils: Successfully started service 'sparkExecutor' on port 39122. 14/09/29 20:35:58.545 INFO CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver@localhost:59125/user/CoarseGrainedScheduler 14/09/29 20:35:58.546 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkWorker2@localhost:56210/user/Worker 14/09/29 20:35:58.557 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkWorker2@localhost:56210/user/Worker 14/09/29 20:35:58.562 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 14/09/29 20:35:58.570 INFO SecurityManager: Changing view acls to: root 14/09/29 20:35:58.571 INFO SecurityManager: Changing modify acls to: root 14/09/29 20:35:58.571 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); u sers with modify permissions: Set(root) 14/09/29 20:35:58.592 INFO Slf4jLogger: Slf4jLogger started 14/09/29 20:35:58.596 INFO Remoting: Starting remoting 14/09/29 20:35:58.610 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@localhost:57639] 14/09/29 20:35:58.610 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@localhost:57639] 14/09/29 20:35:58.611 INFO Utils: Successfully started service 'sparkExecutor' on port 57639. 14/09/29 20:35:58.617 INFO AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver@localhost:59125/user/MapOutputTracker 14/09/29 20:35:58.643 INFO AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver@localhost:59125/user/BlockManagerMaster 14/09/29 20:35:58.672 INFO Utils: Successfully started service 'Connection manager for block manager' on port 50721. 14/09/29 20:35:58.673 INFO ConnectionManager: Bound socket to port 50721 with id = ConnectionManagerId(localhost,50721) 14/09/29 20:35:58.678 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140929203558-9c60 14/09/29 20:35:58.684 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 14/09/29 20:35:58.695 INFO BlockManagerMaster: Trying to register BlockManager 14/09/29 20:35:58.704 INFO BlockManagerMaster: Registered BlockManager 14/09/29 20:35:58.828 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@localhost:59125/user/HeartbeatReceiver 14/09/29 20:35:58.837 INFO CoarseGrainedExecutorBackend: Got assigned task 0 14/09/29 20:35:58.839 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 14/09/29 20:35:58.926 INFO TorrentBroadcast: Started reading broadcast variable 1 14/09/29 20:35:58.969 INFO SendingConnection: Initiating connection to [localhost/127.0.0.1:37826] 14/09/29 20:35:58.971 INFO SendingConnection: Connected to [localhost/127.0.0.1:37826], 1 messages pending 14/09/29 20:35:58.981 INFO ConnectionManager: Accepted connection from [localhost/127.0.0.1:48675] 14/09/29 20:35:59.000 INFO MemoryStore: ensureFreeSpace(1576) called with curMem=0, maxMem=278302556 14/09/29 20:35:59.003 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1576.0 B, free 265.4 MB) 14/09/29 20:35:59.011 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 14/09/29 20:35:59.013 INFO TorrentBroadcast: Reading broadcast variable 1 took 0.085507632 s 14/09/29 20:35:59.104 INFO MemoryStore: ensureFreeSpace(2232) called with curMem=1576, maxMem=278302556 14/09/29 20:35:59.105 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.2 KB, free 265.4 MB) 14/09/29 20:35:59.136 INFO TorrentBroadcast: Started reading broadcast variable 0 14/09/29 20:35:59.141 INFO SendingConnection: Initiating connection to [localhost/127.0.0.1:55386] 14/09/29 20:35:59.141 INFO SendingConnection: Connected to [localhost/127.0.0.1:55386], 1 messages pending 14/09/29 20:35:59.149 INFO ConnectionManager: Accepted connection from [localhost/127.0.0.1:48681] 14/09/29 20:35:59.152 INFO MemoryStore: ensureFreeSpace(278) called with curMem=3808, maxMem=278302556 14/09/29 20:35:59.152 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 278.0 B, free 265.4 MB) 14/09/29 20:35:59.156 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 14/09/29 20:35:59.156 INFO TorrentBroadcast: Reading broadcast variable 0 took 0.019877279 s 14/09/29 20:35:59.158 INFO MemoryStore: ensureFreeSpace(232) called with curMem=4086, maxMem=278302556 14/09/29 20:35:59.158 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 232.0 B, free 265.4 MB) 14/09/29 20:35:59.168 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 715 bytes result sent to driver 14/09/29 20:35:59.173 INFO CoarseGrainedExecutorBackend: Got assigned task 3 14/09/29 20:35:59.173 INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 14/09/29 20:35:59.212 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 715 bytes result sent to driver 14/09/29 20:35:59.217 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/09/29 20:35:59.218 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) 14/09/29 20:35:59.247 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 715 bytes result sent to driver 14/09/29 20:35:59.252 INFO CoarseGrainedExecutorBackend: Got assigned task 7 14/09/29 20:35:59.252 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 14/09/29 20:35:59.281 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 715 bytes result sent to driver 14/09/29 20:35:59.286 INFO CoarseGrainedExecutorBackend: Got assigned task 9 14/09/29 20:35:59.287 INFO Executor: Running task 9.0 in stage 0.0 (TID 9) 14/09/29 20:35:59.313 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 715 bytes result sent to driver 14/09/29 20:35:59.332 INFO BlockManager: Removing broadcast 0 14/09/29 20:35:59.334 INFO BlockManager: Removing block broadcast_0 14/09/29 20:35:59.335 INFO MemoryStore: Block broadcast_0 of size 232 dropped from memory (free 278298470) 14/09/29 20:35:59.335 INFO BlockManager: Removing block broadcast_0_piece0 14/09/29 20:35:59.336 INFO MemoryStore: Block broadcast_0_piece0 of size 278 dropped from memory (free 278298748) 14/09/29 20:35:59.351 INFO CoarseGrainedExecutorBackend: Got assigned task 11 14/09/29 20:35:59.351 INFO Executor: Running task 1.0 in stage 1.0 (TID 11) 14/09/29 20:35:59.370 INFO TorrentBroadcast: Started reading broadcast variable 2 14/09/29 20:35:59.376 INFO MemoryStore: ensureFreeSpace(1577) called with curMem=3808, maxMem=278302556 14/09/29 20:35:59.377 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1577.0 B, free 265.4 MB) 14/09/29 20:35:59.379 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 14/09/29 20:35:59.380 INFO TorrentBroadcast: Reading broadcast variable 2 took 0.009623772 s 14/09/29 20:35:59.380 INFO MemoryStore: ensureFreeSpace(2232) called with curMem=5385, maxMem=278302556 14/09/29 20:35:59.381 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.2 KB, free 265.4 MB) 14/09/29 20:35:59.384 INFO TorrentBroadcast: Started reading broadcast variable 0 14/09/29 20:35:59.387 INFO SendingConnection: Initiating connection to [localhost/127.0.0.1:50721] 14/09/29 20:35:59.388 INFO ConnectionManager: Accepted connection from [localhost/127.0.0.1:48684] 14/09/29 20:35:59.388 INFO SendingConnection: Connected to [localhost/127.0.0.1:50721], 1 messages pending 14/09/29 20:35:59.390 ERROR NioBlockTransferService: block broadcast_0_piece0 cannot be found !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 14/09/29 20:35:59.390 ERROR NioBlockTransferService: block broadcast_0_piece0 cannot be found !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 14/09/29 20:57:21.717 INFO BlockManager: Removing broadcast 1 14/09/29 20:57:21.718 INFO BlockManager: Removing block broadcast_1_piece0 14/09/29 20:57:21.718 INFO MemoryStore: Block broadcast_1_piece0 of size 1576 dropped from memory (free 278296515) 14/09/29 20:57:21.721 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 14/09/29 20:57:21.722 INFO BlockManager: Removing block broadcast_1 14/09/29 20:57:21.722 INFO MemoryStore: Block broadcast_1 of size 2232 dropped from memory (free 278298747) {code} > BroadcastSuite.Unpersisting > rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is > flaky > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-3709 > URL: https://issues.apache.org/jira/browse/SPARK-3709 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Patrick Wendell > Assignee: Reynold Xin > Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org