Thanks Marcelo, that was helpful ! I had some follow up questions :

That's not something you might want to do usually. In general, a
> SparkContext maps to a user application

My question was basically this. In this
<http://spark.apache.org/docs/latest/job-scheduling.html> page in the
official doc, under  "Scheduling within an application" section, it talks
about multiuser and fair sharing within an app. How does multiuser within
an application work(how users connect to an app,run their stuff) ? When
would I want to use this ?

As far as I understand, this will cause executors to be killed, which
> means that Spark will start retrying tasks to rebuild the data that
> was held by those executors when needed.

I basically wanted to find out if there were any "gotchas" related to
preemption on Spark. Things like say half of an application's executors got
preempted say while doing reduceByKey, will the application progress with
the remaining resources/fair share ?

I'm new to spark, sry if I'm asking something very obvious :).

Thanks,
Ashwin

On Wed, Oct 22, 2014 at 12:07 PM, Marcelo Vanzin <van...@cloudera.com>
wrote:

> Hi Ashwin,
>
> Let me try to answer to the best of my knowledge.
>
> On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar
> <ashwinshanka...@gmail.com> wrote:
> > Here are my questions :
> > 1. Sharing spark context : How exactly multiple users can share the
> cluster
> > using same spark
> >     context ?
>
> That's not something you might want to do usually. In general, a
> SparkContext maps to a user application, so each user would submit
> their own job which would create its own SparkContext.
>
> If you want to go outside of Spark, there are project which allow you
> to manage SparkContext instances outside of applications and
> potentially share them, such as
> https://github.com/spark-jobserver/spark-jobserver. But be sure you
> actually need it - since you haven't really explained the use case,
> it's not very clear.
>
> > 2. Different spark context in YARN: assuming I have a YARN cluster with
> > queues and preemption
> >     configured. Are there problems if executors/containers of a spark app
> > are preempted to allow a
> >     high priority spark app to execute ?
>
> As far as I understand, this will cause executors to be killed, which
> means that Spark will start retrying tasks to rebuild the data that
> was held by those executors when needed. Yarn mode does have a
> configurable upper limit on the number of executor failures, so if
> your jobs keeps getting preempted it will eventually fail (unless you
> tweak the settings).
>
> I don't recall whether Yarn has an API to cleanly allow clients to
> stop executors when preempted, but even if it does, I don't think
> that's supported in Spark at the moment.
>
> > How are user names passed on from spark to yarn(say I'm
> > using nested user queues feature in fair scheduler) ?
>
> Spark will try to run the job as the requesting user; if you're not
> using Kerberos, that means the process themselves will be run as
> whatever user runs the Yarn daemons, but the Spark app will be run
> inside a "UserGroupInformation.doAs()" call as the requesting user. So
> technically nested queues should work as expected.
>
> > 3. Sharing RDDs in 1 and 2 above ?
>
> I'll assume you don't mean actually sharing RDDs in the same context,
> but between different SparkContext instances. You might (big might
> here) be able to checkpoint an RDD from one context and load it from
> another context; that's actually like some HA-like features for Spark
> drivers are being addressed.
>
> The job server I mentioned before, which allows different apps to
> share the same Spark context, has a feature to share RDDs by name,
> also, without having to resort to checkpointing.
>
> Hope this helps!
>
> --
> Marcelo
>



-- 
Thanks,
Ashwin

Reply via email to