Re: Spark on Yarn vs Standalone

Alexander Pivovarov Mon, 07 Sep 2015 11:23:29 -0700

Hi Sandy

Thank you for your reply
Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
with emr setting for Spark "maximizeResourceAllocation": "true"


It is automatically converted to Spark settings
spark.executor.memory            47924M
spark.yarn.executor.memoryOverhead 5324

we also set spark.default.parallelism = slave_count * 16

Does it look good for you? (we run single heavy job on cluster)

Alex

On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> Hi Alex,
>
> If they're both configured correctly, there's no reason that Spark
> Standalone should provide performance or memory improvement over Spark on
> YARN.
>
> -Sandy
>
> On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com>
> wrote:
>
>> Hi Everyone
>>
>> We are trying the latest aws emr-4.0.0 and Spark and my question is about
>> YARN vs Standalone mode.
>> Our usecase is
>> - start 100-150 nodes cluster every week,
>> - run one heavy spark job (5-6 hours)
>> - save data to s3
>> - stop cluster
>>
>> Officially aws emr-4.0.0 comes with Spark on Yarn
>> It's probably possible to hack emr by creating bootstrap script which
>> stops yarn and starts master and slaves on each computer  (to start Spark
>> in standalone mode)
>>
>> My questions are
>> - Does Spark standalone provides significant performance / memory
>> improvement in comparison to YARN mode?
>> - Does it worth hacking official emr Spark on Yarn and switch Spark to
>> Standalone mode?
>>
>>
>> I already created comparison table and want you to check if my
>> understanding is correct
>>
>> Lets say r3.2xlarge computer has 52GB ram available for Spark Executor
>> JVMs
>>
>>                     standalone to yarn comparison
>>
>>
>>           STDLN   YARN
>>
>> can executor allocate up to 52GB ram                           - yes  |
>>  yes
>>
>> will executor be unresponsive after using all 52GB ram because of GC -
>> yes  |  yes
>>
>> additional JVMs on slave except of spark executor        - workr | node
>> mngr
>>
>> are additional JVMs lightweight                                     - yes
>>  |  yes
>>
>>
>> Thank you
>>
>> Alex
>>
>
>

Re: Spark on Yarn vs Standalone

Reply via email to