Hi,

I'm trying to get my head around the different parts of Spark on YARN
architecture with YARN's schedulers and queues as well as Spark's own
schedulers - FAIR and FIFO.

I'd appreciate if you could read how I see things and correct me where
I'm wrong. Thanks!

The default scheduler in YARN is Capacity Scheduler [1]. It comes with
the notion of queues. When you spark-submit a Spark application with
--master yarn, you can specify --queue for the scheduling queue and it
is **only** to offer the right share of CPUs and memory to the
application. There could be more resources in the cluster, but that
particular queue has only that exact share of vcores and memory.

In other words, Spark does not know about any other resources but the
ones available in the queue.

Is this correct?

You can also spark-submit a Spark application using FAIR scheduler
(the default is FIFO) using -c spark.scheduler.mode=FAIR.

In FAIR mode, there's also a notion of queue-like (Schedulable) pools.
They can also control the resource shares assigned to Spark
jobs/applications. You could sc.setLocalProperty to control what pool
to use.

Is this correct?

If both are yes, why would I want to go as far as using queues and
FAIR scheduling mode with pools? What are the benefits? Is this for
multi-tenant environments? Do you have any use cases that would fit
better with FAIR scheduling mode? What about YARN's queues with Spark
on YARN?

Share as much as you could since the topic bothers me so much (and
without your support I won't be able to recover from this painful
mental state :))

Thanks for reading so far! Appreciate any help.

[1] 
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to