Setting the following while creating the sparkContext will sort it out.

                .set("spark.core.connection.ack.wait.timeout","600")

.set("spark.akka.frameSize","50")

On 27 Oct 2014 21:15, "shahab" <shahab.mok...@gmail.com> wrote:

> Hi,
>
> I have a stand alone Spark Cluster, where worker and master reside on the
> same machine. I submit a job to the cluster, the job is executed for a
> while and suddenly I get this exception  with no additional trace.
>
> ConnectionManager: key already cancelled ?
> sun.nio.ch.SelectionKeyImpl@2490dce9
> java.nio.channels.CancelledKeyException at
> org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
>  at
> org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
>
>
> Any idea where should I look for the cause?
>
> best,
> /shahab
>
> This following is the part of printout from "driver application" logs:
>
> 14/10/27 15:21:15 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
> ip-10-89-32-179.eu-west-1.compute.internal:40479 in memory (size: 3.4 KB,
> free: 1565.6 MB)
> 14/10/27 15:21:15 INFO ContextCleaner: Cleaned broadcast 1
> 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
> shuffle 1 for deleting
> 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 1
> 14/10/27 15:21:15 INFO ShuffleBlockManager: Could not find files for
> shuffle 0 for deleting
> 14/10/27 15:21:15 INFO ContextCleaner: Cleaned shuffle 0
> 14/10/27 15:21:15 INFO BlockManagerInfo: Removed taskresult_9 on
> ip-10-zz.xx-yy:40479 in memory (size: 24.1 MB, free: 1589.8 MB)
> 14/10/27 15:21:16 INFO DAGScheduler: Stage 7 (collect at
> TimeBenchmarking_SimpleModel.scala:55) finished in 3.209 s
> 14/10/27 15:21:16 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID
> 9) in 2640 ms onip-10-zz.xx-yy (1/1)
> 14/10/27 15:21:16 INFO SparkContext: Job finished: collect at
> TimeBenchmarking_SimpleModel.scala:55, took 102.661420511 s
> 14/10/27 15:21:16 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks
> have all completed, from pool
> 14/10/27 15:21:16 INFO SparkUI: Stopped Spark web UI at
> http://ip-10-zz.xx-yy:4040
> 14/10/27 15:21:16 INFO DAGScheduler: Stopping DAGScheduler
> 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Shutting down all
> executors
> 14/10/27 15:21:16 INFO SparkDeploySchedulerBackend: Asking each executor
> to shut down
> 14/10/27 15:21:16 INFO ConnectionManager: Removing ReceivingConnection to
> ConnectionManagerId(ip-10-zz.xx-yy, 40479)
> 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
> ConnectionManagerId(ip-10-zz.xx-yy,40479)
> 14/10/27 15:21:16 INFO ConnectionManager: Removing SendingConnection to
> ConnectionManagerId(ip-10-zz.xx-yy,40479)
> 14/10/27 15:21:16 INFO ConnectionManager: Key not valid ?
> sun.nio.ch.SelectionKeyImpl@2490dce9
> 14/10/27 15:21:16 INFO ConnectionManager: key already cancelled ?
> sun.nio.ch.SelectionKeyImpl@2490dce9
> java.nio.channels.CancelledKeyException
>         at
> org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
>         at
> org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
> 14/10/27 15:21:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor
> stopped!
> 14/10/27 15:21:17 INFO ConnectionManager: Selector thread was interrupted!
>

Reply via email to