Spark's FairSchedulingAlgorithm is not round robin: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala#L43
When at the scope of fair scheduling Jobs within a single Pool, the Schedulable entities being handled (s1 and s2) are TaskSetManagers, which are at the granularity of Stages, not Jobs. Since weight is 1 and minShare is 0 for TaskSetManagers, the FairSchedulingAlgorithm for TaskSetManagers just boils down to prioritizing TaskSets (i.e. Stages) with the fewest number of runningTasks. On Thu, Sep 1, 2016 at 11:23 AM, enrico d'urso <e.du...@live.com> wrote: > I tried it before, but still I am not able to see a proper round robin > across the jobs I submit. > Given this: > > <pool name="production"> > <schedulingMode>FAIR</schedulingMode> > <weight>1</weight> > <minShare>2</minShare> > </pool> > > Each jobs inside production pool should be scheduled in round robin way, > am I right? > > ------------------------------ > *From:* Mark Hamstra <m...@clearstorydata.com> > *Sent:* Thursday, September 1, 2016 8:19:44 PM > *To:* enrico d'urso > *Cc:* user@spark.apache.org > *Subject:* Re: Spark scheduling mode > > The default pool (`<pool name = "default">`) can be configured like any > other pool: https://spark.apache.org/docs/latest/job- > scheduling.html#configuring-pool-properties > > On Thu, Sep 1, 2016 at 11:11 AM, enrico d'urso <e.du...@live.com> wrote: > >> Is there a way to force scheduling to be fair *inside* the default pool? >> I mean, round robin for the jobs that belong to the default pool. >> >> Cheers, >> ------------------------------ >> *From:* Mark Hamstra <m...@clearstorydata.com> >> *Sent:* Thursday, September 1, 2016 7:24:54 PM >> *To:* enrico d'urso >> *Cc:* user@spark.apache.org >> *Subject:* Re: Spark scheduling mode >> >> Just because you've flipped spark.scheduler.mode to FAIR, that doesn't >> mean that Spark can magically configure and start multiple scheduling pools >> for you, nor can it know to which pools you want jobs assigned. Without >> doing any setup of additional scheduling pools or assigning of jobs to >> pools, you're just dumping all of your jobs into the one available default >> pool (which is now being fair scheduled with an empty set of other pools) >> and the scheduling of jobs within that pool is still the default intra-pool >> scheduling, FIFO -- i.e., you've effectively accomplished nothing by only >> flipping spark.scheduler.mode to FAIR. >> >> On Thu, Sep 1, 2016 at 7:10 AM, enrico d'urso <e.du...@live.com> wrote: >> >>> I am building a Spark App, in which I submit several jobs (pyspark). I >>> am using threads to run them in parallel, and also I am setting: >>> conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run >>> serially in FIFO way. Am I missing something? >>> >>> Cheers, >>> >>> >>> Enrico >>> >> >> >