Re: Questions about Spark standalone resource scheduler

2015-02-02 Thread Patrick Wendell
Hey Jerry,

I think standalone mode will still add more features over time, but
the goal isn't really for it to become equivalent to what Mesos/YARN
are today. Or at least, I doubt Spark Standalone will ever attempt to
manage _other_ frameworks outside of Spark and become a general
purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the
future. For instance, we added H/A to the standalone scheduler a while
back, because it let us support H/A streaming apps in a totally native
way. It's a trade off of adding new features and keeping the scheduler
very simple and easy to use. We've tended to bias towards simplicity
as the main goal, since this is something we want to be really easy
out of the box.

One thing to point out, a lot of people use the standalone mode with
some coarser grained scheduler, such as running in a cloud service. In
this case they really just want a simple inner cluster manager. This
may even be the majority of all Spark installations. This is slightly
different than Hadoop environments, where they might just want nice
integration into the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai saisai.s...@intel.com wrote:
 Hi all,



 I have some questions about the future development of Spark's standalone
 resource scheduler. We've heard some users have the requirements to have
 multi-tenant support in standalone mode, like multi-user management,
 resource management and isolation, whitelist of users. Seems current Spark
 standalone do not support such kind of functionalities, while resource
 schedulers like Yarn offers such kind of advanced managements, I'm not sure
 what's the future target of standalone resource scheduler, will it only
 target on simple implementation, and for advanced usage shift to YARN? Or
 will it plan to add some simple multi-tenant related functionalities?



 Thanks a lot for your comments.



 BR

 Jerry

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Questions about Spark standalone resource scheduler

2015-02-02 Thread Shao, Saisai
Hi Patrick,

Thanks a lot for your detailed explanation. For now we have such requirements: 
whitelist the application submitter, user resources (CPU, MEMORY) quotas, 
resources allocations in Spark Standalone mode. These are quite specific 
requirements for production-use, generally these problem will become whether we 
need to offer a more advanced resource scheduler compared to current simple 
FIFO one. I think our aim is to not provide a general resource scheduler like 
Mesos/Yarn, we only support Spark, but we hope to add some Mesos/Yarn 
functionalities to better use of Spark standalone mode.

I admitted that resource scheduler may have some overlaps with cloud manager, 
whether to offer a powerful scheduler or use cloud manager is really a dilemma.

I think we can break down to some small features to improve the standalone 
mode. What's your opinion?

Thanks
Jerry

-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Monday, February 2, 2015 4:49 PM
To: Shao, Saisai
Cc: d...@spark.apache.org; user@spark.apache.org
Subject: Re: Questions about Spark standalone resource scheduler

Hey Jerry,

I think standalone mode will still add more features over time, but the goal 
isn't really for it to become equivalent to what Mesos/YARN are today. Or at 
least, I doubt Spark Standalone will ever attempt to manage _other_ frameworks 
outside of Spark and become a general purpose resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the future. 
For instance, we added H/A to the standalone scheduler a while back, because it 
let us support H/A streaming apps in a totally native way. It's a trade off of 
adding new features and keeping the scheduler very simple and easy to use. 
We've tended to bias towards simplicity as the main goal, since this is 
something we want to be really easy out of the box.

One thing to point out, a lot of people use the standalone mode with some 
coarser grained scheduler, such as running in a cloud service. In this case 
they really just want a simple inner cluster manager. This may even be the 
majority of all Spark installations. This is slightly different than Hadoop 
environments, where they might just want nice integration into the existing 
Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai saisai.s...@intel.com wrote:
 Hi all,



 I have some questions about the future development of Spark's 
 standalone resource scheduler. We've heard some users have the 
 requirements to have multi-tenant support in standalone mode, like 
 multi-user management, resource management and isolation, whitelist of 
 users. Seems current Spark standalone do not support such kind of 
 functionalities, while resource schedulers like Yarn offers such kind 
 of advanced managements, I'm not sure what's the future target of 
 standalone resource scheduler, will it only target on simple 
 implementation, and for advanced usage shift to YARN? Or will it plan to add 
 some simple multi-tenant related functionalities?



 Thanks a lot for your comments.



 BR

 Jerry

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org