[jira] [Commented] (SPARK-3709) BroadcastSuite.Unpersisting rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is flaky
[ https://issues.apache.org/jira/browse/SPARK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152238#comment-14152238 ] Reynold Xin commented on SPARK-3709: Adding stack trace {code} [info] - Unpersisting TorrentBroadcast on executors only in distributed mode *** FAILED *** [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 17, localhost): java.io.IOException: sendMessageReliably failed with ACK that signalled a remote error [info] org.apache.spark.network.nio.ConnectionManager$$anonfun$14.apply(ConnectionManager.scala:864) [info] org.apache.spark.network.nio.ConnectionManager$$anonfun$14.apply(ConnectionManager.scala:856) [info] org.apache.spark.network.nio.ConnectionManager$MessageStatus.markDone(ConnectionManager.scala:61) [info] org.apache.spark.network.nio.ConnectionManager.org$apache$spark$network$nio$ConnectionManager$$handleMessage(ConnectionManager.scala:655) [info] org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:515) [info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] java.lang.Thread.run(Thread.java:745) [info] Driver stacktrace: [info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1192) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1181) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1180) [info] at scala.coSpark assembly has been built with Hive, including Datanucleus jars on classpath Spark assembly has been built with Hive, including Datanucleus jars on classpath llection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) [info] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) [info] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1180) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:695) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:695) [info] at scala.Option.foreach(Option.scala:236) [info] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:695) [info] ... {code} BroadcastSuite.Unpersisting rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is flaky Key: SPARK-3709 URL: https://issues.apache.org/jira/browse/SPARK-3709 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Cheng Lian Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3709) BroadcastSuite.Unpersisting rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is flaky
[ https://issues.apache.org/jira/browse/SPARK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152377#comment-14152377 ] Apache Spark commented on SPARK-3709: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/2585 BroadcastSuite.Unpersisting rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is flaky Key: SPARK-3709 URL: https://issues.apache.org/jira/browse/SPARK-3709 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Assignee: Reynold Xin Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3709) BroadcastSuite.Unpersisting rg.apache.spark.broadcast.BroadcastSuite.Unpersisting TorrentBroadcast is flaky
[ https://issues.apache.org/jira/browse/SPARK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152774#comment-14152774 ] Reynold Xin commented on SPARK-3709: Hanging driver stack trace {code} pool-1-thread-1-ScalaTest-running-BroadcastSuite prio=10 tid=0x7f2114812000 nid=0xc8c in Object.wait() [0x7f20bb8fd000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) - locked 0x0007a2ff4bb8 (a org.apache.spark.scheduler.JobWaiter) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:512) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1087) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1104) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1118) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1132) at org.apache.spark.rdd.RDD.collect(RDD.scala:775) at org.apache.spark.broadcast.BroadcastSuite.testUnpersistBroadcast(BroadcastSuite.scala:291) at org.apache.spark.broadcast.BroadcastSuite.org$apache$spark$broadcast$BroadcastSuite$$testUnpersistTorrentBroadcast(BroadcastSuite.scala:232) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply$mcV$sp(BroadcastSuite.scala:112) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112) at org.apache.spark.broadcast.BroadcastSuite$$anonfun$13.apply(BroadcastSuite.scala:112) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) at org.scalatest.Suite$class.withFixture(Suite.scala:1121) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) at org.apache.spark.broadcast.BroadcastSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(BroadcastSuite.scala:26) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.broadcast.BroadcastSuite.runTest(BroadcastSuite.scala:26) ... {code} Executor log {code} 14/09/29 20:35:57.254 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 14/09/29 20:35:57.502 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabl e 14/09/29 20:35:57.716 INFO SecurityManager: Changing view acls to: root 14/09/29 20:35:57.717 INFO SecurityManager: Changing modify acls to: root 14/09/29 20:35:57.717 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); u sers with modify permissions: Set(root) 14/09/29 20:35:58.096 INFO Slf4jLogger: Slf4jLogger started 14/09/29 20:35:58.136 INFO Remoting: Starting remoting 14/09/29 20:35:58.279 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@localhost:42339] 14/09/29 20:35:58.280 INFO Remoting: Remoting now listens on addresses: [akka.tcp://driverPropsFetcher@localhost:42339] 14/09/29 20:35:58.287 INFO Utils: Successfully started service 'driverPropsFetcher' on port 42339. 14/09/29 20:35:58.461 INFO SecurityManager: Changing view acls to: root 14/09/29 20:35:58.461 INFO SecurityManager: Changing modify acls to: root 14/09/29 20:35:58.462 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); u sers with modify permissions: Set(root) 14/09/29 20:35:58.466 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 14/09/29 20:35:58.467 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 14/09/29 20:35:58.493 INFO Slf4jLogger: Slf4jLogger started 14/09/29 20:35:58.499 INFO Remoting: Starting remoting 14/09/29 20:35:58.502 INFO Remoting: Remoting shut down 14/09/29 20:35:58.503 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 14/09/29