Re: Spark scheduling mode

Mark Hamstra Thu, 01 Sep 2016 11:44:47 -0700

Spark's FairSchedulingAlgorithm is not round robin:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala#L43


When at the scope of fair scheduling Jobs within a single Pool, the
Schedulable entities being handled (s1 and s2) are TaskSetManagers, which
are at the granularity of Stages, not Jobs.  Since weight is 1 and minShare
is 0 for TaskSetManagers, the FairSchedulingAlgorithm for TaskSetManagers
just boils down to prioritizing TaskSets (i.e. Stages) with the fewest
number of runningTasks.

On Thu, Sep 1, 2016 at 11:23 AM, enrico d'urso <e.du...@live.com> wrote:

> I tried it before, but still I am not able to see a proper round robin
> across the jobs I submit.
> Given this:
>
> <pool name="production">
>     <schedulingMode>FAIR</schedulingMode>
>     <weight>1</weight>
>     <minShare>2</minShare>
>   </pool>
>
> Each jobs inside production pool should be scheduled in round robin way,
> am I right?
>
> ------------------------------
> *From:* Mark Hamstra <m...@clearstorydata.com>
> *Sent:* Thursday, September 1, 2016 8:19:44 PM
> *To:* enrico d'urso
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark scheduling mode
>
> The default pool (`<pool name = "default">`) can be configured like any
> other pool: https://spark.apache.org/docs/latest/job-
> scheduling.html#configuring-pool-properties
>
> On Thu, Sep 1, 2016 at 11:11 AM, enrico d'urso <e.du...@live.com> wrote:
>
>> Is there a way to force scheduling to be fair *inside* the default pool?
>> I mean, round robin for the jobs that belong to the default pool.
>>
>> Cheers,
>> ------------------------------
>> *From:* Mark Hamstra <m...@clearstorydata.com>
>> *Sent:* Thursday, September 1, 2016 7:24:54 PM
>> *To:* enrico d'urso
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Spark scheduling mode
>>
>> Just because you've flipped spark.scheduler.mode to FAIR, that doesn't
>> mean that Spark can magically configure and start multiple scheduling pools
>> for you, nor can it know to which pools you want jobs assigned.  Without
>> doing any setup of additional scheduling pools or assigning of jobs to
>> pools, you're just dumping all of your jobs into the one available default
>> pool (which is now being fair scheduled with an empty set of other pools)
>> and the scheduling of jobs within that pool is still the default intra-pool
>> scheduling, FIFO -- i.e., you've effectively accomplished nothing by only
>> flipping spark.scheduler.mode to FAIR.
>>
>> On Thu, Sep 1, 2016 at 7:10 AM, enrico d'urso <e.du...@live.com> wrote:
>>
>>> I am building a Spark App, in which I submit several jobs (pyspark). I
>>> am using threads to run them in parallel, and also I am setting:
>>> conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run
>>> serially in FIFO way. Am I missing something?
>>>
>>> Cheers,
>>>
>>>
>>> Enrico
>>>
>>
>>
>

Re: Spark scheduling mode

Reply via email to