Re: Multitenancy in Spark - within/across spark context

Marcelo Vanzin Thu, 23 Oct 2014 10:08:22 -0700

You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174.


On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang <[email protected]> wrote:
> Upvote for the multitanency requirement.
>
> I'm also building a data analytic platform and there'll be multiple users
> running queries and computations simultaneously. One of the paint point is
> control of resource size. Users don't really know how much nodes they need,
> they always use as much as possible... The result is lots of wasted resource
> in our Yarn cluster.
>
> A way to 1) allow multiple spark context to share the same resource or 2)
> add dynamic resource management for Yarn mode is very much wanted.
>
> Jianshi
>
> On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin <[email protected]> wrote:
>>
>> On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
>> <[email protected]> wrote:
>> >> That's not something you might want to do usually. In general, a
>> >> SparkContext maps to a user application
>> >
>> > My question was basically this. In this page in the official doc, under
>> > "Scheduling within an application" section, it talks about multiuser and
>> > fair sharing within an app. How does multiuser within an application
>> > work(how users connect to an app,run their stuff) ? When would I want to
>> > use
>> > this ?
>>
>> I see. The way I read that page is that Spark supports all those
>> scheduling options; but Spark doesn't give you the means to actually
>> be able to submit jobs from different users to a running SparkContext
>> hosted on a different process. For that, you'll need something like
>> the job server that I referenced before, or write your own framework
>> for supporting that.
>>
>> Personally, I'd use the information on that page when dealing with
>> concurrent jobs in the same SparkContext, but still restricted to the
>> same user. I'd avoid trying to create any application where a single
>> SparkContext is trying to be shared by multiple users in any way.
>>
>> >> As far as I understand, this will cause executors to be killed, which
>> >> means that Spark will start retrying tasks to rebuild the data that
>> >> was held by those executors when needed.
>> >
>> > I basically wanted to find out if there were any "gotchas" related to
>> > preemption on Spark. Things like say half of an application's executors
>> > got
>> > preempted say while doing reduceByKey, will the application progress
>> > with
>> > the remaining resources/fair share ?
>>
>> Jobs should still make progress as long as at least one executor is
>> available. The gotcha would be the one I mentioned, where Spark will
>> fail your job after "x" executors failed, which might be a common
>> occurrence when preemption is enabled. That being said, it's a
>> configurable option, so you can set "x" to a very large value and your
>> job should keep on chugging along.
>>
>> The options you'd want to take a look at are: spark.task.maxFailures
>> and spark.yarn.max.executor.failures
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Multitenancy in Spark - within/across spark context

Reply via email to