It's 2 -- and it's pretty hard to point to a line of code, a method, or
even a class since the scheduling of Tasks involves a pretty complex
interaction of several Spark components -- mostly the DAGScheduler,
TaskScheduler/TaskSchedulerImpl, TaskSetManager, Schedulable and Pool, as
well as the SchedulerBackend (CoarseGrainedSchedulerBackend in this case.)
 The key thing to understand, though, is the comment at the top of
SchedulerBackend.scala: "A backend interface for scheduling systems that
allows plugging in different ones under TaskSchedulerImpl. We assume a
Mesos-like model where the application gets resource offers as machines
become available and can launch tasks on them."  In other words, the whole
scheduling system is built on a model that starts with offers made by
workers when resources are available to run Tasks.  Other than the big
hammer of canceling a Job while interruptOnCancel is true, there isn't
really any facility for stopping or rescheduling Tasks that are already
started, so that rules out your option 1.  Similarly, option 3 is out
because the scheduler doesn't know when Tasks will complete; it just knows
when a new offer comes in and it is time to send more Tasks to be run on
the machine making the offer.

What actually happens is that the Pool with which a Job is associated
maintains a queue of TaskSets needing to be scheduled.  When in
resourceOffers the TaskSchedulerImpl needs sortedTaskSets, the Pool
supplies those from its scheduling queue after first sorting it according
to the Pool's taskSetSchedulingAlgorithm.  In other words, what Spark's
fair scheduling does in essence is, in response to worker resource offers,
to send new Tasks to be run; those Tasks are taken in sets from the queue
of waiting TaskSets, sorted according to a scheduling algorithm.  There is
no pre-emption or rescheduling of Tasks that the scheduler has already sent
to the workers, nor is there any attempt to anticipate when already running
Tasks will complete.


On Sat, Feb 20, 2016 at 4:14 PM, Eugene Morozov <evgeny.a.moro...@gmail.com>
wrote:

> Hi,
>
> I'm trying to understand how this thing works underneath. Let's say I have
> two types of jobs - high important, that might use small amount of cores
> and has to be run pretty fast. And less important, but greedy - uses as
> many cores as available. So, the idea is to use two corresponding pools.
>
> Then thing I'm trying to understand is the following.
> I use standalone spark deployment (no YARN, no Mesos).
> Let's say that less important took all the cores and then someone runs
> high important job. Then I see three possibilities:
> 1. Spark kill some executors that currently runs less important partitions
> to assign them to a high performant job.
> 2. Spark will wait until some partitions of less important job will be
> completely processed and then first executors that become free will be
> assigned to process high important job.
> 3. Spark will figure out specific time, when particular stages of
> partitions of less important jobs is done, and instead of continue with
> this job, these executors will be reassigned to high important one.
>
> Which one it is? Could you please point me to a class / method / line of
> code?
> --
> Be well!
> Jean Morozov
>

Reply via email to