Re: Lost task - connection closed
I am getting same error. Any resolution on this issue ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p24082.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Lost task - connection closed
Hi, Thanks for the reponse. I discovered my problem was that some of the executors got OOM, tracing down the logs of executors helps discovering the problem. Usually the log from the driver do not reflect the OOM error and therefore causes confusions among users. This is just the discoveries on my side, not sure if OP was having the same problem though On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: Hi Can you share the code you are trying to run. Thanks Arush On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid wrote: I have seen the same problem, It causes some tasks to fail, but not the whole job to fail. Hope someone could shed some light on what could be the cause of this. On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com wrote: It looks like something weird is going on with your object serialization, perhaps a funny form of self-reference which is not detected by ObjectOutputStream's typical loop avoidance. That, or you have some data structure like a linked list with a parent pointer and you have many thousand elements. Assuming the stack trace is coming from an executor, it is probably a problem with the objects you're sending back as results, so I would carefully examine these and maybe try serializing some using ObjectOutputStream manually. If your program looks like foo.map { row = doComplexOperation(row) }.take(10) you can also try changing it to foo.map { row = doComplexOperation(row); 1 }.take(10) to avoid serializing the result of that complex operation, which should help narrow down where exactly the problematic objects are coming from. On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea octavian.ga...@inf.ethz.ch wrote: Here is the first error I get at the executors: 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[handle-message-executor-16,5,main] java.lang.StackOverflowError at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
Re: Lost task - connection closed
Hi, Thanks for the reponse. I discovered my problem was that some of the executors got OOM, tracing down the logs of executors helps discovering the problem. Usually the log from the driver do not reflect the OOM error and therefore causes confusions among users. This is just the discoveries on my side, not sure if OP was having the same problem though On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: Hi Can you share the code you are trying to run. Thanks Arush On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid wrote: I have seen the same problem, It causes some tasks to fail, but not the whole job to fail. Hope someone could shed some light on what could be the cause of this. On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com wrote: It looks like something weird is going on with your object serialization, perhaps a funny form of self-reference which is not detected by ObjectOutputStream's typical loop avoidance. That, or you have some data structure like a linked list with a parent pointer and you have many thousand elements. Assuming the stack trace is coming from an executor, it is probably a problem with the objects you're sending back as results, so I would carefully examine these and maybe try serializing some using ObjectOutputStream manually. If your program looks like foo.map { row = doComplexOperation(row) }.take(10) you can also try changing it to foo.map { row = doComplexOperation(row); 1 }.take(10) to avoid serializing the result of that complex operation, which should help narrow down where exactly the problematic objects are coming from. On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea octavian.ga...@inf.ethz.ch wrote: Here is the first error I get at the executors: 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[handle-message-executor-16,5,main] java.lang.StackOverflowError at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
Re: Lost task - connection closed
Here is the first error I get at the executors: 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[handle-message-executor-16,5,main] java.lang.StackOverflowError at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) If you have any pointers for me on how to debug this, that would be very useful. I tried running with both spark 1.2.0 and 1.1.1, getting the same error. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p21371.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Lost task - connection closed
It looks like something weird is going on with your object serialization, perhaps a funny form of self-reference which is not detected by ObjectOutputStream's typical loop avoidance. That, or you have some data structure like a linked list with a parent pointer and you have many thousand elements. Assuming the stack trace is coming from an executor, it is probably a problem with the objects you're sending back as results, so I would carefully examine these and maybe try serializing some using ObjectOutputStream manually. If your program looks like foo.map { row = doComplexOperation(row) }.take(10) you can also try changing it to foo.map { row = doComplexOperation(row); 1 }.take(10) to avoid serializing the result of that complex operation, which should help narrow down where exactly the problematic objects are coming from. On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea octavian.ga...@inf.ethz.ch wrote: Here is the first error I get at the executors: 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[handle-message-executor-16,5,main] java.lang.StackOverflowError at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) If you have any pointers for me on how to debug this, that would be very useful. I tried running with both spark 1.2.0 and 1.1.1, getting the same error. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p21371.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Lost task - connection closed
Please take a look at the executor logs (on both sides of the IOException) to see if there are other exceptions (e.g., OOM) which precede this one. Generally, the connections should not fail spontaneously. On Sun, Jan 25, 2015 at 10:35 PM, octavian.ganea octavian.ga...@inf.ethz.ch wrote: Hi, I am running a program that executes map-reduce jobs in a loop. The first time the loop runs, everything is ok. After that, it starts giving the following error, first it gives it for one task, then for more tasks and eventually the entire program fails: 15/01/26 01:41:25 WARN TaskSetManager: Lost task 10.0 in stage 15.0 (TID 1063, hostnameXX): java.io.IOException: Connection from hostnameXX/172.31.109.50:50808 closed at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:98) at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:81) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169) at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738) at io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) Can someone help me with debugging this ? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org