Re: Lost task - connection closed

2015-07-30 Thread firemonk9
I am getting same error. Any resolution on this issue ?

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p24082.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Lost task - connection closed

2015-02-17 Thread Tianshuo Deng
Hi, Thanks for the reponse.
I discovered my problem was that some of the executors got OOM, tracing
down the logs of executors helps discovering the problem. Usually the log
from the driver do not reflect the OOM error and therefore causes
confusions among users.

This is just the discoveries on my side, not sure if OP was having the same
problem though

On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

 Hi

 Can you share the code you are trying to run.

 Thanks
 Arush

 On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid
 wrote:

 I have seen the same problem, It causes some tasks to fail, but not the
 whole job to fail.
 Hope someone could shed some light on what could be the cause of this.

 On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com
 wrote:

 It looks like something weird is going on with your object
 serialization, perhaps a funny form of self-reference which is not detected
 by ObjectOutputStream's typical loop avoidance. That, or you have some data
 structure like a linked list with a parent pointer and you have many
 thousand elements.

 Assuming the stack trace is coming from an executor, it is probably a
 problem with the objects you're sending back as results, so I would
 carefully examine these and maybe try serializing some using
 ObjectOutputStream manually.

 If your program looks like
 foo.map { row = doComplexOperation(row) }.take(10)

 you can also try changing it to
 foo.map { row = doComplexOperation(row); 1 }.take(10)

 to avoid serializing the result of that complex operation, which should
 help narrow down where exactly the problematic objects are coming from.

 On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea 
 octavian.ga...@inf.ethz.ch wrote:

 Here is the first error I get at the executors:

 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception
 in thread Thread[handle-message-executor-16,5,main]
 java.lang.StackOverflowError
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 

Re: Lost task - connection closed

2015-02-17 Thread Tianshuo Deng
Hi, Thanks for the reponse.
I discovered my problem was that some of the executors got OOM, tracing
down the logs of executors helps discovering the problem. Usually the log
from the driver do not reflect the OOM error and therefore causes
confusions among users.

This is just the discoveries on my side, not sure if OP was having the same
problem though

On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

 Hi

 Can you share the code you are trying to run.

 Thanks
 Arush

 On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid
 wrote:

 I have seen the same problem, It causes some tasks to fail, but not the
 whole job to fail.
 Hope someone could shed some light on what could be the cause of this.

 On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com
 wrote:

 It looks like something weird is going on with your object
 serialization, perhaps a funny form of self-reference which is not detected
 by ObjectOutputStream's typical loop avoidance. That, or you have some data
 structure like a linked list with a parent pointer and you have many
 thousand elements.

 Assuming the stack trace is coming from an executor, it is probably a
 problem with the objects you're sending back as results, so I would
 carefully examine these and maybe try serializing some using
 ObjectOutputStream manually.

 If your program looks like
 foo.map { row = doComplexOperation(row) }.take(10)

 you can also try changing it to
 foo.map { row = doComplexOperation(row); 1 }.take(10)

 to avoid serializing the result of that complex operation, which should
 help narrow down where exactly the problematic objects are coming from.

 On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea 
 octavian.ga...@inf.ethz.ch wrote:

 Here is the first error I get at the executors:

 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception
 in thread Thread[handle-message-executor-16,5,main]
 java.lang.StackOverflowError
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 

Re: Lost task - connection closed

2015-01-26 Thread octavian.ganea
Here is the first error I get at the executors: 

15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception
in thread Thread[handle-message-executor-16,5,main]
java.lang.StackOverflowError
at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
at
java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)

If you have any pointers for me on how to debug this, that would be very
useful. I tried running with both spark 1.2.0 and 1.1.1, getting the same
error.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p21371.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Lost task - connection closed

2015-01-26 Thread Aaron Davidson
It looks like something weird is going on with your object serialization,
perhaps a funny form of self-reference which is not detected by
ObjectOutputStream's typical loop avoidance. That, or you have some data
structure like a linked list with a parent pointer and you have many
thousand elements.

Assuming the stack trace is coming from an executor, it is probably a
problem with the objects you're sending back as results, so I would
carefully examine these and maybe try serializing some using
ObjectOutputStream manually.

If your program looks like
foo.map { row = doComplexOperation(row) }.take(10)

you can also try changing it to
foo.map { row = doComplexOperation(row); 1 }.take(10)

to avoid serializing the result of that complex operation, which should
help narrow down where exactly the problematic objects are coming from.

On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea octavian.ga...@inf.ethz.ch
wrote:

 Here is the first error I get at the executors:

 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception
 in thread Thread[handle-message-executor-16,5,main]
 java.lang.StackOverflowError
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)

 If you have any pointers for me on how to debug this, that would be very
 useful. I tried running with both spark 1.2.0 and 1.1.1, getting the same
 error.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p21371.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 

Re: Lost task - connection closed

2015-01-25 Thread Aaron Davidson
Please take a look at the executor logs (on both sides of the IOException)
to see if there are other exceptions (e.g., OOM) which precede this one.
Generally, the connections should not fail spontaneously.

On Sun, Jan 25, 2015 at 10:35 PM, octavian.ganea octavian.ga...@inf.ethz.ch
 wrote:

 Hi,

 I am running a program that executes map-reduce jobs in a loop. The first
 time the loop runs, everything is ok. After that, it starts giving the
 following error, first it gives it for one task, then for more tasks and
 eventually the entire program fails:

 15/01/26 01:41:25 WARN TaskSetManager: Lost task 10.0 in stage 15.0 (TID
 1063, hostnameXX): java.io.IOException: Connection from
 hostnameXX/172.31.109.50:50808 closed
 at

 org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:98)
 at

 org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:81)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
 at

 io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
 at

 io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
 at

 io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)
 at

 io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)
 at

 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
 at

 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 at java.lang.Thread.run(Thread.java:745)

 Can someone help me with debugging this ?

 Thank you!



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org