Hi Sandy Thank you for your reply Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB) with emr setting for Spark "maximizeResourceAllocation": "true"
It is automatically converted to Spark settings spark.executor.memory 47924M spark.yarn.executor.memoryOverhead 5324 we also set spark.default.parallelism = slave_count * 16 Does it look good for you? (we run single heavy job on cluster) Alex On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hi Alex, > > If they're both configured correctly, there's no reason that Spark > Standalone should provide performance or memory improvement over Spark on > YARN. > > -Sandy > > On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> Hi Everyone >> >> We are trying the latest aws emr-4.0.0 and Spark and my question is about >> YARN vs Standalone mode. >> Our usecase is >> - start 100-150 nodes cluster every week, >> - run one heavy spark job (5-6 hours) >> - save data to s3 >> - stop cluster >> >> Officially aws emr-4.0.0 comes with Spark on Yarn >> It's probably possible to hack emr by creating bootstrap script which >> stops yarn and starts master and slaves on each computer (to start Spark >> in standalone mode) >> >> My questions are >> - Does Spark standalone provides significant performance / memory >> improvement in comparison to YARN mode? >> - Does it worth hacking official emr Spark on Yarn and switch Spark to >> Standalone mode? >> >> >> I already created comparison table and want you to check if my >> understanding is correct >> >> Lets say r3.2xlarge computer has 52GB ram available for Spark Executor >> JVMs >> >> standalone to yarn comparison >> >> >> STDLN YARN >> >> can executor allocate up to 52GB ram - yes | >> yes >> >> will executor be unresponsive after using all 52GB ram because of GC - >> yes | yes >> >> additional JVMs on slave except of spark executor - workr | node >> mngr >> >> are additional JVMs lightweight - yes >> | yes >> >> >> Thank you >> >> Alex >> > >