Setting the following while creating the sparkContext will sort it out. .set("spark.core.connection.ack.wait.timeout","600")
.set("spark.akka.frameSize","50") On 27 Oct 2014 21:15, "shahab" <shahab.mok...@gmail.com> wrote: > Hi, > > I have a stand alone Spark Cluster, where worker and master reside on the > same machine. I submit a job to the cluster, the job is executed for a > while and suddenly I get this exception with no additional trace. > > ConnectionManager: key already cancelled ? > sun.nio.ch.SelectionKeyImpl@2490dce9 > java.nio.channels.CancelledKeyException at > org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) > at > org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) > > > Any idea where should I look for the cause? > > best, > /shahab > > This following is the part of printout from "driver application" logs: > > 14/10/27 15:21:15 INFO BlockManagerInfo: Removed broadcast_1_piece0 on > ip-10-89-32-179.eu-west-1.compute.internal:40479 in memory (size: 3.4 KB, > free: 1565.6 MB) > 14/10/27 15:21:15 INFO ContextCleaner: Cleaned broadcast 1 > 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for > shuffle 1 for deleting > 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 1 > 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for > shuffle 0 for deleting > 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 0 > 14/10/27 15:21:15 INFO BlockManagerInfo: Removed taskresult_9 on > ip-10-zz.xx-yy:40479 in memory (size: 24.1 MB, free: 1589.8 MB) > 14/10/27 15:21:16 INFO DAGScheduler: Stage 7 (collect at > TimeBenchmarking_SimpleModel.scala:55) finished in 3.209 s > 14/10/27 15:21:16 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID > 9) in 2640 ms onip-10-zz.xx-yy (1/1) > 14/10/27 15:21:16 INFO SparkContext: Job finished: collect at > TimeBenchmarking_SimpleModel.scala:55, took 102.661420511 s > 14/10/27 15:21:16 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks > have all completed, from pool > 14/10/27 15:21:16 INFO SparkUI: Stopped Spark web UI at > http://ip-10-zz.xx-yy:4040 > 14/10/27 15:21:16 INFO DAGScheduler: Stopping DAGScheduler > 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Shutting down all > executors > 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Asking each executor > to shut down > 14/10/27 15:21:16 INFO ConnectionManager: Removing ReceivingConnection to > ConnectionManagerId(ip-10-zz.xx-yy, 40479) > 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to > ConnectionManagerId(ip-10-zz.xx-yy,40479) > 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to > ConnectionManagerId(ip-10-zz.xx-yy,40479) > 14/10/27 15:21:16 INFO ConnectionManager: Key not valid ? > sun.nio.ch.SelectionKeyImpl@2490dce9 > 14/10/27 15:21:16 INFO ConnectionManager: key already cancelled ? > sun.nio.ch.SelectionKeyImpl@2490dce9 > java.nio.channels.CancelledKeyException > at > org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386) > at > org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) > 14/10/27 15:21:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor > stopped! > 14/10/27 15:21:17 INFO ConnectionManager: Selector thread was interrupted! >