[ 
https://issues.apache.org/jira/browse/SPARK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152774#comment-14152774
 ] 

Reynold Xin commented on SPARK-3709:
------------------------------------

Hanging driver stack trace
{code}
"pool-1-thread-1-ScalaTest-running-BroadcastSuite" prio=10 
tid=0x00007f2114812000 nid=0xc8c in Object.wait() [0x00007f20bb8fd000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
        - locked <0x00000007a2ff4bb8> (a org.apache.spark.scheduler.JobWaiter)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:512)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1087)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1104)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1118)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1132)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:775)
        at 
org.apache.spark.broadcast.BroadcastSuite.testUnpersistBroadcast(BroadcastSuite.scala:291)
        at 
org.apache.spark.broadcast.BroadcastSuite.org$apache$spark$broadcast$BroadcastSuite$$testUnpersistTorrentBroadcast(BroadcastSuite.scala:232)
        at 
org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply$mcV$sp(BroadcastSuite.scala:112)
        at 
org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112)
        at 
org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112)
        at 
org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at 
org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
        at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
        at org.scalatest.Transformer.apply(Transformer.scala:22)
        at org.scalatest.Transformer.apply(Transformer.scala:20)
        at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158)
        at org.scalatest.Suite$class.withFixture(Suite.scala:1121)
        at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559)
        at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155)
        at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167)
        at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167)
        at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
        at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167)
        at 
org.apache.spark.broadcast.BroadcastSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(BroadcastSuite.scala:26)
        at 
org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
        at 
org.apache.spark.broadcast.BroadcastSuite.runTest(BroadcastSuite.scala:26)
   ...
{code}

Executor log
{code}
14/09/29 20:35:57.254 INFO CoarseGrainedExecutorBackend: Registered signal 
handlers for [TERM, HUP, INT]
14/09/29 20:35:57.502 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicabl
e
14/09/29 20:35:57.716 INFO SecurityManager: Changing view acls to: root
14/09/29 20:35:57.717 INFO SecurityManager: Changing modify acls to: root
14/09/29 20:35:57.717 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); u
sers with modify permissions: Set(root)
14/09/29 20:35:58.096 INFO Slf4jLogger: Slf4jLogger started
14/09/29 20:35:58.136 INFO Remoting: Starting remoting
14/09/29 20:35:58.279 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://driverPropsFetcher@localhost:42339]
14/09/29 20:35:58.280 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://driverPropsFetcher@localhost:42339]
14/09/29 20:35:58.287 INFO Utils: Successfully started service 
'driverPropsFetcher' on port 42339.
14/09/29 20:35:58.461 INFO SecurityManager: Changing view acls to: root
14/09/29 20:35:58.461 INFO SecurityManager: Changing modify acls to: root
14/09/29 20:35:58.462 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); u
sers with modify permissions: Set(root)
14/09/29 20:35:58.466 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
down remote daemon.
14/09/29 20:35:58.467 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
14/09/29 20:35:58.493 INFO Slf4jLogger: Slf4jLogger started
14/09/29 20:35:58.499 INFO Remoting: Starting remoting
14/09/29 20:35:58.502 INFO Remoting: Remoting shut down
14/09/29 20:35:58.503 INFO RemoteActorRefProvider$RemotingTerminator: Remoting 
shut down.
14/09/29 20:35:58.540 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@localhost:39122]
14/09/29 20:35:58.540 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sparkExecutor@localhost:39122]
14/09/29 20:35:58.541 INFO Utils: Successfully started service 'sparkExecutor' 
on port 39122.
14/09/29 20:35:58.545 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
akka.tcp://sparkDriver@localhost:59125/user/CoarseGrainedScheduler
14/09/29 20:35:58.546 INFO WorkerWatcher: Connecting to worker 
akka.tcp://sparkWorker2@localhost:56210/user/Worker
14/09/29 20:35:58.557 INFO WorkerWatcher: Successfully connected to 
akka.tcp://sparkWorker2@localhost:56210/user/Worker
14/09/29 20:35:58.562 INFO CoarseGrainedExecutorBackend: Successfully 
registered with driver
14/09/29 20:35:58.570 INFO SecurityManager: Changing view acls to: root
14/09/29 20:35:58.571 INFO SecurityManager: Changing modify acls to: root
14/09/29 20:35:58.571 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); u
sers with modify permissions: Set(root)
14/09/29 20:35:58.592 INFO Slf4jLogger: Slf4jLogger started
14/09/29 20:35:58.596 INFO Remoting: Starting remoting
14/09/29 20:35:58.610 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@localhost:57639]
14/09/29 20:35:58.610 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sparkExecutor@localhost:57639]
14/09/29 20:35:58.611 INFO Utils: Successfully started service 'sparkExecutor' 
on port 57639.
14/09/29 20:35:58.617 INFO AkkaUtils: Connecting to MapOutputTracker: 
akka.tcp://sparkDriver@localhost:59125/user/MapOutputTracker
14/09/29 20:35:58.643 INFO AkkaUtils: Connecting to BlockManagerMaster: 
akka.tcp://sparkDriver@localhost:59125/user/BlockManagerMaster
14/09/29 20:35:58.672 INFO Utils: Successfully started service 'Connection 
manager for block manager' on port 50721.
14/09/29 20:35:58.673 INFO ConnectionManager: Bound socket to port 50721 with 
id = ConnectionManagerId(localhost,50721)
14/09/29 20:35:58.678 INFO DiskBlockManager: Created local directory at 
/tmp/spark-local-20140929203558-9c60
14/09/29 20:35:58.684 INFO MemoryStore: MemoryStore started with capacity 265.4 
MB
14/09/29 20:35:58.695 INFO BlockManagerMaster: Trying to register BlockManager
14/09/29 20:35:58.704 INFO BlockManagerMaster: Registered BlockManager
14/09/29 20:35:58.828 INFO AkkaUtils: Connecting to HeartbeatReceiver: 
akka.tcp://sparkDriver@localhost:59125/user/HeartbeatReceiver
14/09/29 20:35:58.837 INFO CoarseGrainedExecutorBackend: Got assigned task 0
14/09/29 20:35:58.839 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/09/29 20:35:58.926 INFO TorrentBroadcast: Started reading broadcast variable 
1
14/09/29 20:35:58.969 INFO SendingConnection: Initiating connection to 
[localhost/127.0.0.1:37826]
14/09/29 20:35:58.971 INFO SendingConnection: Connected to 
[localhost/127.0.0.1:37826], 1 messages pending
14/09/29 20:35:58.981 INFO ConnectionManager: Accepted connection from 
[localhost/127.0.0.1:48675]
14/09/29 20:35:59.000 INFO MemoryStore: ensureFreeSpace(1576) called with 
curMem=0, maxMem=278302556
14/09/29 20:35:59.003 INFO MemoryStore: Block broadcast_1_piece0 stored as 
bytes in memory (estimated size 1576.0 B, free 265.4 MB)
14/09/29 20:35:59.011 INFO BlockManagerMaster: Updated info of block 
broadcast_1_piece0
14/09/29 20:35:59.013 INFO TorrentBroadcast: Reading broadcast variable 1 took 
0.085507632 s
14/09/29 20:35:59.104 INFO MemoryStore: ensureFreeSpace(2232) called with 
curMem=1576, maxMem=278302556
14/09/29 20:35:59.105 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 2.2 KB, free 265.4 MB)
14/09/29 20:35:59.136 INFO TorrentBroadcast: Started reading broadcast variable 0
14/09/29 20:35:59.141 INFO SendingConnection: Initiating connection to 
[localhost/127.0.0.1:55386]
14/09/29 20:35:59.141 INFO SendingConnection: Connected to 
[localhost/127.0.0.1:55386], 1 messages pending
14/09/29 20:35:59.149 INFO ConnectionManager: Accepted connection from 
[localhost/127.0.0.1:48681]
14/09/29 20:35:59.152 INFO MemoryStore: ensureFreeSpace(278) called with 
curMem=3808, maxMem=278302556
14/09/29 20:35:59.152 INFO MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 278.0 B, free 265.4 MB)
14/09/29 20:35:59.156 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
14/09/29 20:35:59.156 INFO TorrentBroadcast: Reading broadcast variable 0 took 
0.019877279 s
14/09/29 20:35:59.158 INFO MemoryStore: ensureFreeSpace(232) called with 
curMem=4086, maxMem=278302556
14/09/29 20:35:59.158 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 232.0 B, free 265.4 MB)
14/09/29 20:35:59.168 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 
715 bytes result sent to driver
14/09/29 20:35:59.173 INFO CoarseGrainedExecutorBackend: Got assigned task 3
14/09/29 20:35:59.173 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
14/09/29 20:35:59.212 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 
715 bytes result sent to driver
14/09/29 20:35:59.217 INFO CoarseGrainedExecutorBackend: Got assigned task 5
14/09/29 20:35:59.218 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
14/09/29 20:35:59.247 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 
715 bytes result sent to driver
14/09/29 20:35:59.252 INFO CoarseGrainedExecutorBackend: Got assigned task 7
14/09/29 20:35:59.252 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
14/09/29 20:35:59.281 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 
715 bytes result sent to driver
14/09/29 20:35:59.286 INFO CoarseGrainedExecutorBackend: Got assigned task 9
14/09/29 20:35:59.287 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
14/09/29 20:35:59.313 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 
715 bytes result sent to driver
14/09/29 20:35:59.332 INFO BlockManager: Removing broadcast 0
14/09/29 20:35:59.334 INFO BlockManager: Removing block broadcast_0
14/09/29 20:35:59.335 INFO MemoryStore: Block broadcast_0 of size 232 dropped 
from memory (free 278298470)
14/09/29 20:35:59.335 INFO BlockManager: Removing block broadcast_0_piece0
14/09/29 20:35:59.336 INFO MemoryStore: Block broadcast_0_piece0 of size 278 
dropped from memory (free 278298748)
14/09/29 20:35:59.351 INFO CoarseGrainedExecutorBackend: Got assigned task 11
14/09/29 20:35:59.351 INFO Executor: Running task 1.0 in stage 1.0 (TID 11)
14/09/29 20:35:59.370 INFO TorrentBroadcast: Started reading broadcast variable 
2
14/09/29 20:35:59.376 INFO MemoryStore: ensureFreeSpace(1577) called with 
curMem=3808, maxMem=278302556
14/09/29 20:35:59.377 INFO MemoryStore: Block broadcast_2_piece0 stored as 
bytes in memory (estimated size 1577.0 B, free 265.4 MB)
14/09/29 20:35:59.379 INFO BlockManagerMaster: Updated info of block 
broadcast_2_piece0
14/09/29 20:35:59.380 INFO TorrentBroadcast: Reading broadcast variable 2 took 
0.009623772 s
14/09/29 20:35:59.380 INFO MemoryStore: ensureFreeSpace(2232) called with 
curMem=5385, maxMem=278302556
14/09/29 20:35:59.381 INFO MemoryStore: Block broadcast_2 stored as values in 
memory (estimated size 2.2 KB, free 265.4 MB)
14/09/29 20:35:59.384 INFO TorrentBroadcast: Started reading broadcast variable 0
14/09/29 20:35:59.387 INFO SendingConnection: Initiating connection to 
[localhost/127.0.0.1:50721]
14/09/29 20:35:59.388 INFO ConnectionManager: Accepted connection from 
[localhost/127.0.0.1:48684]
14/09/29 20:35:59.388 INFO SendingConnection: Connected to 
[localhost/127.0.0.1:50721], 1 messages pending
14/09/29 20:35:59.390 ERROR NioBlockTransferService: block broadcast_0_piece0 
cannot be found !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
14/09/29 20:35:59.390 ERROR NioBlockTransferService: block broadcast_0_piece0 
cannot be found !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
14/09/29 20:57:21.717 INFO BlockManager: Removing broadcast 1
14/09/29 20:57:21.718 INFO BlockManager: Removing block broadcast_1_piece0
14/09/29 20:57:21.718 INFO MemoryStore: Block broadcast_1_piece0 of size 1576 
dropped from memory (free 278296515)
14/09/29 20:57:21.721 INFO BlockManagerMaster: Updated info of block 
broadcast_1_piece0
14/09/29 20:57:21.722 INFO BlockManager: Removing block broadcast_1
14/09/29 20:57:21.722 INFO MemoryStore: Block broadcast_1 of size 2232 dropped 
from memory (free 278298747)
{code}


> BroadcastSuite.Unpersisting 
> rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is 
> flaky 
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3709
>                 URL: https://issues.apache.org/jira/browse/SPARK-3709
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Reynold Xin
>            Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to