Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
And I thought I sent it to the right list! Here you go again - Question below : 

On May 14, 2014, at 3:06 PM, Vipul Pandey vipan...@gmail.com wrote:

 So here's a followup question : What's the preferred mode? 
 We have a new cluster coming up with petabytes of data and we intend to take 
 Spark to production. We are trying to figure out what mode would be safe and 
 stable for production like environment. 
 pros and cons? anyone? 
 
 Any reasons why one would chose Standalone over YARN?
 
 Thanks,
 Vipul





 
 On May 4, 2014, at 5:56 PM, Liu, Raymond raymond@intel.com wrote:
 
 In the core, they are not quite different
 In standalone mode, you have spark master and spark worker who allocate 
 driver and executors for your spark app.
 While in Yarn mode, Yarn resource manager and node manager do this work.
 When the driver and executors have been launched, the rest part of resource 
 scheduling go through the same process, say between driver and executor 
 through akka actor.
 
 Best Regards,
 Raymond Liu
 
 
 -Original Message-
 From: Sophia [mailto:sln-1...@163.com] 
 
 Hey you guys,
 What is the different in spark on yarn mode and standalone mode about 
 resource schedule?
 Wish you happy everyday.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 



Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Vipul Pandey
Thanks for responding, Sandy. 

YARN for sure is a more mature way of working on shared resources. I was not 
sure about how stable Spark on YARN is and if anyone is using it in production. 
I have been using Standalone mode in our dev cluster but multi-tenancy and 
resource allocation wise it's difficult to call it production ready yet. (I'm 
not sure if 1.0 has significant changes or not as I haven't kept up lately)

What I get from your response below is that for production like environment 
YARN will be a better choice as, for our case, we don't care too much about 
saving a few seconds in startup time. Stability will definitely be a concern 
but Im assuming that Spark on Yarn is not terrible either and will mature over 
the period of time, in which case we don't have to compromise on other 
important factors (like resource sharing and prioritization)

btw, can I see information on what RDDs are cached and their size etc. on YARN? 
like I see in the standalone mode UI?


~Vipul

On May 15, 2014, at 5:24 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Vipul,
 
 Some advantages of using YARN:
 * YARN allows you to dynamically share and centrally configure the same pool 
 of cluster resources between all frameworks that run on YARN.  You can throw 
 your entire cluster at a MapReduce job, then use some of it on an Impala 
 query and the rest on Spark application, without any changes in configuration.
 * You can take advantage of all the features of YARN schedulers for 
 categorizing, isolating, and prioritizing workloads.
 * YARN provides CPU-isolation between processes with CGroups. Spark 
 standalone mode requires each application to run an executor on every node in 
 the cluster - with YARN, you choose the number of executors to use.
 * YARN is the only cluster manager for Spark that supports security and 
 Kerberized clusters.
 
 Some advantages of using standalone:
 * It has been around for longer, so it is likely a little more stable.
 * Many report faster startup times for apps.
 
 -Sandy
 
 
 On Wed, May 14, 2014 at 3:06 PM, Vipul Pandey vipan...@gmail.com wrote:
 So here's a followup question : What's the preferred mode?
 We have a new cluster coming up with petabytes of data and we intend to take 
 Spark to production. We are trying to figure out what mode would be safe and 
 stable for production like environment.
 pros and cons? anyone?
 
 Any reasons why one would chose Standalone over YARN?
 
 Thanks,
 Vipul
 
 On May 4, 2014, at 5:56 PM, Liu, Raymond raymond@intel.com wrote:
 
  In the core, they are not quite different
  In standalone mode, you have spark master and spark worker who allocate 
  driver and executors for your spark app.
  While in Yarn mode, Yarn resource manager and node manager do this work.
  When the driver and executors have been launched, the rest part of resource 
  scheduling go through the same process, say between driver and executor 
  through akka actor.
 
  Best Regards,
  Raymond Liu
 
 
  -Original Message-
  From: Sophia [mailto:sln-1...@163.com]
 
  Hey you guys,
  What is the different in spark on yarn mode and standalone mode about 
  resource schedule?
  Wish you happy everyday.
 
 
 
  --
  View this message in context: 
  http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 



Re: different in spark on yarn mode and standalone mode

2014-05-16 Thread Sandy Ryza
Hi Vipul,

Some advantages of using YARN:
* YARN allows you to dynamically share and centrally configure the same
pool of cluster resources between all frameworks that run on YARN.  You can
throw your entire cluster at a MapReduce job, then use some of it on an
Impala query and the rest on Spark application, without any changes in
configuration.
* You can take advantage of all the features of YARN schedulers for
categorizing, isolating, and prioritizing workloads.
* YARN provides CPU-isolation between processes with CGroups. Spark
standalone mode requires each application to run an executor on every node
in the cluster - with YARN, you choose the number of executors to use.
* YARN is the only cluster manager for Spark that supports security and
Kerberized clusters.

Some advantages of using standalone:
* It has been around for longer, so it is likely a little more stable.
* Many report faster startup times for apps.

-Sandy


On Wed, May 14, 2014 at 3:06 PM, Vipul Pandey vipan...@gmail.com wrote:

 So here's a followup question : What's the preferred mode?
 We have a new cluster coming up with petabytes of data and we intend to
 take Spark to production. We are trying to figure out what mode would be
 safe and stable for production like environment.
 pros and cons? anyone?

 Any reasons why one would chose Standalone over YARN?

 Thanks,
 Vipul

 On May 4, 2014, at 5:56 PM, Liu, Raymond raymond@intel.com wrote:

  In the core, they are not quite different
  In standalone mode, you have spark master and spark worker who allocate
 driver and executors for your spark app.
  While in Yarn mode, Yarn resource manager and node manager do this work.
  When the driver and executors have been launched, the rest part of
 resource scheduling go through the same process, say between driver and
 executor through akka actor.
 
  Best Regards,
  Raymond Liu
 
 
  -Original Message-
  From: Sophia [mailto:sln-1...@163.com]
 
  Hey you guys,
  What is the different in spark on yarn mode and standalone mode about
 resource schedule?
  Wish you happy everyday.
 
 
 
  --
  View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.




RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
In the core, they are not quite different
In standalone mode, you have spark master and spark worker who allocate driver 
and executors for your spark app.
While in Yarn mode, Yarn resource manager and node manager do this work.
When the driver and executors have been launched, the rest part of resource 
scheduling go through the same process, say between driver and executor through 
akka actor.

Best Regards,
Raymond Liu


-Original Message-
From: Sophia [mailto:sln-1...@163.com] 

Hey you guys,
What is the different in spark on yarn mode and standalone mode about resource 
schedule?
Wish you happy everyday.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.