Hi,

We are seeing this issue with Spark 1.6.1. The executor is exiting when one
of the running tasks is cancelled.

The executor logs is showing the below error and crashing.

16/04/27 16:34:13 ERROR SparkUncaughtExceptionHandler: [Container in
> shutdown] Uncaught exception in thread Thread[Executor task launch
> worker-2,5,main]
> java.lang.Error: java.nio.channels.ClosedByInterruptException
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at
> java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1219)
> at
> org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47)
> at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
> at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:252)
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:195)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:132)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:288)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ... 2 more


I have attached the full logs at this gist:
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f

On the driver side, the following info is logged (
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f)

The following lines show that Executor exited because of the running tasks

2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70] -
> Lost task 0.0 in stage 89.0 (TID 173, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di
> sassociated. Likely due to containers exceeding thresholds, or network
> issues. Check driver logs for WARN messages.
> 2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70]
> - Lost task 1.0 in stage 89.0 (TID 174, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di


Is it possible for executor to die when the jobs in the sparkContext are
cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I
could not find any Jiras that report this error.

Sometimes, we notice a scenario where the executor dies and driver doesn't
request for a new one. This causes the jobs to hang indefinitely. We are
using dynamic allocation for our jobs.

Thanks,


>
Kiran Chitturi

Reply via email to