Sorry, to clarify: Spark *does* effectively turn Akka's failure detector
off.


On Tue, May 27, 2014 at 10:47 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> Spark should effectively turn Akka's failure detector off, because we
> historically had problems with GCs and other issues causing
> disassociations. The only thing that should cause these messages nowadays
> is if the TCP connection (which Akka sustains between Actor Systems on
> different machines) actually drops. TCP connections are pretty resilient,
> so one common cause of this is actual Executor failure -- recently, I have
> experienced a similar-sounding problem due to my machine's OOM killer
> terminating my Executors, such that they didn't produce any error output.
>
>
> On Thu, May 22, 2014 at 9:19 AM, Chanwit Kaewkasi <chan...@gmail.com>wrote:
>
>> Hi all,
>>
>> On an ARM cluster, I have been testing a wordcount program with JRE 7
>> and everything is OK. But when changing to the embedded version of
>> Java SE (Oracle's eJRE), the same program cannot complete all
>> computing stages.
>>
>> It is failed by many Akka's disassociation.
>>
>> - I've been trying to increase Akka's timeout but still stuck. I am
>> not sure what is the right way to do so? (I suspected that GC pausing
>> the world is causing this).
>>
>> - Another question is that how could I properly turn on Akka's logging
>> to see what's the root cause of this disassociation problem? (If my
>> guess about GC is wrong).
>>
>> Best regards,
>>
>> -chanwit
>>
>> --
>> Chanwit Kaewkasi
>> linkedin.com/in/chanwit
>>
>
>

Reply via email to