Re: remote Akka client disassociated - some timeout?

Ted Yu Sat, 17 Jan 2015 03:22:49 -0800

Antony:
Please check hbase master log to see if there was something noticeable in that 
period of time. 
If the hbase cluster is not big, check region server log as well.


Cheers



> On Jan 16, 2015, at 10:00 AM, Antony Mayi <antonym...@yahoo.com.INVALID> 
> wrote:
> 
> Hi,
> 
> I believe this is some kind of timeout problem but can't figure out how to 
> increase it.
> 
> I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task 
> which first loads big RDD from hbase - I can see in the screen output all 
> executors fire up then no more logging output for next two minutes after 
> which I get plenty of
> 
> 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 
> on node01: remote Akka client disassociated
> 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 from 
> TaskSet 1.0
> 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 
> (TID 17, node01): ExecutorLostFailure (executor 7 lost)
> 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 
> (TID 25, node01): ExecutorLostFailure (executor 7 lost)
> 
> this points to some timeout ~120secs while the nodes are loading the big RDD? 
> any ideas how to get around it?
> 
> fyi I already use following options without any success:
> 
>     spark.core.connection.ack.wait.timeout: 600
>     spark.akka.timeout: 1000
> 
> 
> thanks,
> Antony.
> 
>

Re: remote Akka client disassociated - some timeout?

Reply via email to