Hi Deb,

The current state of the art is to increase
spark.yarn.executor.memoryOverhead until the job stops failing.  We do have
plans to try to automatically scale this based on the amount of memory
requested, but it will still just be a heuristic.

-Sandy

On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das <debasish.da...@gmail.com>
wrote:

> Hi Sandy,
>
> Any resolution for YARN failures ? It's a blocker for running spark on top
> of YARN.
>
> Thanks.
> Deb
>
> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> Hi Deb,
>>
>> I think this may be the same issue as described in
>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
>> container got killed by YARN because it used much more memory that it
>> requested. But we haven't figured out the root cause yet.
>>
>> +Sandy
>>
>> Best,
>> Xiangrui
>>
>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <debasish.da...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > During the 4th ALS iteration, I am noticing that one of the executor
>> gets
>> > disconnected:
>> >
>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
>> > SendingConnectionManagerId not found
>> >
>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
>> > disconnected, so removing it
>> >
>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
>> executor 5
>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated
>> >
>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
>> 12)
>> > Any idea if this is a bug related to akka on YARN ?
>> >
>> > I am using master
>> >
>> > Thanks.
>> > Deb
>>
>
>

Reply via email to