Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Ah, that's why all the stuff about scheduler pools is under the
section "Scheduling
Within an Application
<https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>".
 I am so used to talking to my coworkers about jobs in sense of
applications that I forgot your typical Spark application submits multiple
"jobs", each of which has multiple stages, etc.

So in my case I need to read up more closely about YARN queues
<https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>
since I want to share resources *across* applications. Thanks Mark!

On Wed, Apr 5, 2017 at 4:31 PM Mark Hamstra <m...@clearstorydata.com> wrote:

> `spark-submit` creates a new Application that will need to get resources
> from YARN. Spark's scheduler pools will determine how those resources are
> allocated among whatever Jobs run within the new Application.
>
> Spark's scheduler pools are only relevant when you are submitting multiple
> Jobs within a single Application (i.e., you are using the same SparkContext
> to launch multiple Jobs) and you have used SparkContext#setLocalProperty to
> set "spark.scheduler.pool" to something other than the default pool before
> a particular Job intended to use that pool is started via that SparkContext.
>
> On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
> Hmm, so when I submit an application with `spark-submit`, I need to
> guarantee it resources using YARN queues and not Spark's scheduler pools.
> Is that correct?
>
> When are Spark's scheduler pools relevant/useful in this context?
>
> On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> grrr... s/your/you're/
>
> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> --
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>
>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
`spark-submit` creates a new Application that will need to get resources
from YARN. Spark's scheduler pools will determine how those resources are
allocated among whatever Jobs run within the new Application.

Spark's scheduler pools are only relevant when you are submitting multiple
Jobs within a single Application (i.e., you are using the same SparkContext
to launch multiple Jobs) and you have used SparkContext#setLocalProperty to
set "spark.scheduler.pool" to something other than the default pool before
a particular Job intended to use that pool is started via that SparkContext.

On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Hmm, so when I submit an application with `spark-submit`, I need to
> guarantee it resources using YARN queues and not Spark's scheduler pools.
> Is that correct?
>
> When are Spark's scheduler pools relevant/useful in this context?
>
> On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> grrr... s/your/you're/
>>
>> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>> Your mixing up different levels of scheduling. Spark's fair scheduler
>> pools are about scheduling Jobs, not Applications; whereas YARN queues with
>> Spark are about scheduling Applications, not Jobs.
>>
>> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com
>> > wrote:
>>
>> I'm having trouble understanding the difference between Spark fair
>> scheduler pools
>> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
>> and YARN queues
>> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
>> Do they conflict? Does one override the other?
>>
>> I posted a more detailed question about an issue I'm having with this on
>> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>>
>> Nick
>>
>>
>> --
>> View this message in context: Spark fair scheduler pools vs. YARN queues
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>>
>>
>>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Hmm, so when I submit an application with `spark-submit`, I need to
guarantee it resources using YARN queues and not Spark's scheduler pools.
Is that correct?

When are Spark's scheduler pools relevant/useful in this context?

On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra <m...@clearstorydata.com> wrote:

> grrr... s/your/you're/
>
> On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> --
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
grrr... s/your/you're/

On Wed, Apr 5, 2017 at 12:54 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Your mixing up different levels of scheduling. Spark's fair scheduler
> pools are about scheduling Jobs, not Applications; whereas YARN queues with
> Spark are about scheduling Applications, not Jobs.
>
> On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>> I'm having trouble understanding the difference between Spark fair
>> scheduler pools
>> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
>> and YARN queues
>> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
>> Do they conflict? Does one override the other?
>>
>> I posted a more detailed question about an issue I'm having with this on
>> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>>
>> Nick
>>
>>
>> --
>> View this message in context: Spark fair scheduler pools vs. YARN queues
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>


Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Mark Hamstra
Your mixing up different levels of scheduling. Spark's fair scheduler pools
are about scheduling Jobs, not Applications; whereas YARN queues with Spark
are about scheduling Applications, not Jobs.

On Wed, Apr 5, 2017 at 12:27 PM, Nick Chammas <nicholas.cham...@gmail.com>
wrote:

> I'm having trouble understanding the difference between Spark fair
> scheduler pools
> <https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
> and YARN queues
> <https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
> Do they conflict? Does one override the other?
>
> I posted a more detailed question about an issue I'm having with this on
> Stack Overflow: http://stackoverflow.com/q/43239921/877069
>
> Nick
>
>
> --
> View this message in context: Spark fair scheduler pools vs. YARN queues
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>


Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nick Chammas
I'm having trouble understanding the difference between Spark fair
scheduler pools
<https://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools>
and YARN queues
<https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>.
Do they conflict? Does one override the other?

I posted a more detailed question about an issue I'm having with this on
Stack Overflow: http://stackoverflow.com/q/43239921/877069

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-fair-scheduler-pools-vs-YARN-queues-tp28572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

[YARN] Questions about YARN's queues and Spark's FAIR scheduler

2016-06-16 Thread Jacek Laskowski
Hi,

I'm trying to get my head around the different parts of Spark on YARN
architecture with YARN's schedulers and queues as well as Spark's own
schedulers - FAIR and FIFO.

I'd appreciate if you could read how I see things and correct me where
I'm wrong. Thanks!

The default scheduler in YARN is Capacity Scheduler [1]. It comes with
the notion of queues. When you spark-submit a Spark application with
--master yarn, you can specify --queue for the scheduling queue and it
is **only** to offer the right share of CPUs and memory to the
application. There could be more resources in the cluster, but that
particular queue has only that exact share of vcores and memory.

In other words, Spark does not know about any other resources but the
ones available in the queue.

Is this correct?

You can also spark-submit a Spark application using FAIR scheduler
(the default is FIFO) using -c spark.scheduler.mode=FAIR.

In FAIR mode, there's also a notion of queue-like (Schedulable) pools.
They can also control the resource shares assigned to Spark
jobs/applications. You could sc.setLocalProperty to control what pool
to use.

Is this correct?

If both are yes, why would I want to go as far as using queues and
FAIR scheduling mode with pools? What are the benefits? Is this for
multi-tenant environments? Do you have any use cases that would fit
better with FAIR scheduling mode? What about YARN's queues with Spark
on YARN?

Share as much as you could since the topic bothers me so much (and
without your support I won't be able to recover from this painful
mental state :))

Thanks for reading so far! Appreciate any help.

[1] 
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Queues

2014-11-09 Thread Deep Pradhan
Has anyone implemented Queues using RDDs?


Thank You