What this exception means? ConnectionManager: key already cancelled ?

2014-10-27 Thread shahab
Hi,

I have a stand alone Spark Cluster, where worker and master reside on the
same machine. I submit a job to the cluster, the job is executed for a
while and suddenly I get this exception  with no additional trace.

ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@2490dce9
java.nio.channels.CancelledKeyException at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)


Any idea where should I look for the cause?

best,
/shahab

This following is the part of printout from driver application logs:

14/10/27 15:21:15 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
ip-10-89-32-179.eu-west-1.compute.internal:40479 in memory (size: 3.4 KB,
free: 1565.6 MB)
14/10/27 15:21:15 INFO ContextCleaner: Cleaned broadcast 1
14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
shuffle 1 for deleting
14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 1
14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
shuffle 0 for deleting
14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 0
14/10/27 15:21:15 INFO BlockManagerInfo: Removed taskresult_9 on
ip-10-zz.xx-yy:40479 in memory (size: 24.1 MB, free: 1589.8 MB)
14/10/27 15:21:16 INFO DAGScheduler: Stage 7 (collect at
TimeBenchmarking_SimpleModel.scala:55) finished in 3.209 s
14/10/27 15:21:16 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID
9) in 2640 ms onip-10-zz.xx-yy (1/1)
14/10/27 15:21:16 INFO SparkContext: Job finished: collect at
TimeBenchmarking_SimpleModel.scala:55, took 102.661420511 s
14/10/27 15:21:16 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks
have all completed, from pool
14/10/27 15:21:16 INFO SparkUI: Stopped Spark web UI at
http://ip-10-zz.xx-yy:4040
14/10/27 15:21:16 INFO DAGScheduler: Stopping DAGScheduler
14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Shutting down all
executors
14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Asking each executor to
shut down
14/10/27 15:21:16 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(ip-10-zz.xx-yy, 40479)
14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(ip-10-zz.xx-yy,40479)
14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(ip-10-zz.xx-yy,40479)
14/10/27 15:21:16 INFO ConnectionManager: Key not valid ?
sun.nio.ch.SelectionKeyImpl@2490dce9
14/10/27 15:21:16 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@2490dce9
java.nio.channels.CancelledKeyException
at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
14/10/27 15:21:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor
stopped!
14/10/27 15:21:17 INFO ConnectionManager: Selector thread was interrupted!


Re: What this exception means? ConnectionManager: key already cancelled ?

2014-10-27 Thread Akhil Das
Setting the following while creating the sparkContext will sort it out.

.set(spark.core.connection.ack.wait.timeout,600)

.set(spark.akka.frameSize,50)

On 27 Oct 2014 21:15, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I have a stand alone Spark Cluster, where worker and master reside on the
 same machine. I submit a job to the cluster, the job is executed for a
 while and suddenly I get this exception  with no additional trace.

 ConnectionManager: key already cancelled ?
 sun.nio.ch.SelectionKeyImpl@2490dce9
 java.nio.channels.CancelledKeyException at
 org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
  at
 org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)


 Any idea where should I look for the cause?

 best,
 /shahab

 This following is the part of printout from driver application logs:

 14/10/27 15:21:15 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
 ip-10-89-32-179.eu-west-1.compute.internal:40479 in memory (size: 3.4 KB,
 free: 1565.6 MB)
 14/10/27 15:21:15 INFO ContextCleaner: Cleaned broadcast 1
 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
 shuffle 1 for deleting
 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 1
 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
 shuffle 0 for deleting
 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 0
 14/10/27 15:21:15 INFO BlockManagerInfo: Removed taskresult_9 on
 ip-10-zz.xx-yy:40479 in memory (size: 24.1 MB, free: 1589.8 MB)
 14/10/27 15:21:16 INFO DAGScheduler: Stage 7 (collect at
 TimeBenchmarking_SimpleModel.scala:55) finished in 3.209 s
 14/10/27 15:21:16 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID
 9) in 2640 ms onip-10-zz.xx-yy (1/1)
 14/10/27 15:21:16 INFO SparkContext: Job finished: collect at
 TimeBenchmarking_SimpleModel.scala:55, took 102.661420511 s
 14/10/27 15:21:16 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks
 have all completed, from pool
 14/10/27 15:21:16 INFO SparkUI: Stopped Spark web UI at
 http://ip-10-zz.xx-yy:4040
 14/10/27 15:21:16 INFO DAGScheduler: Stopping DAGScheduler
 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Shutting down all
 executors
 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Asking each executor
 to shut down
 14/10/27 15:21:16 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(ip-10-zz.xx-yy, 40479)
 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(ip-10-zz.xx-yy,40479)
 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(ip-10-zz.xx-yy,40479)
 14/10/27 15:21:16 INFO ConnectionManager: Key not valid ?
 sun.nio.ch.SelectionKeyImpl@2490dce9
 14/10/27 15:21:16 INFO ConnectionManager: key already cancelled ?
 sun.nio.ch.SelectionKeyImpl@2490dce9
 java.nio.channels.CancelledKeyException
 at
 org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at
 org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
 14/10/27 15:21:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor
 stopped!
 14/10/27 15:21:17 INFO ConnectionManager: Selector thread was interrupted!