Spark scheduling mode

2016-09-01 Thread enrico d';urso
I am building a Spark App, in which I submit several jobs (pyspark). I am using 
threads to run them in parallel, and also I am setting: 
conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run serially in 
FIFO way. Am I missing something?

Cheers,


Enrico


Re: Spark scheduling mode

2016-09-01 Thread enrico d';urso
Is there a way to force scheduling to be fair inside the default pool?
I mean, round robin for the jobs that belong to the default pool.

Cheers,


From: Mark Hamstra 
Sent: Thursday, September 1, 2016 7:24:54 PM
To: enrico d'urso
Cc: user@spark.apache.org
Subject: Re: Spark scheduling mode

Just because you've flipped spark.scheduler.mode to FAIR, that doesn't mean 
that Spark can magically configure and start multiple scheduling pools for you, 
nor can it know to which pools you want jobs assigned.  Without doing any setup 
of additional scheduling pools or assigning of jobs to pools, you're just 
dumping all of your jobs into the one available default pool (which is now 
being fair scheduled with an empty set of other pools) and the scheduling of 
jobs within that pool is still the default intra-pool scheduling, FIFO -- i.e., 
you've effectively accomplished nothing by only flipping spark.scheduler.mode 
to FAIR.

On Thu, Sep 1, 2016 at 7:10 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

I am building a Spark App, in which I submit several jobs (pyspark). I am using 
threads to run them in parallel, and also I am setting: 
conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run serially in 
FIFO way. Am I missing something?

Cheers,


Enrico



Re: Spark scheduling mode

2016-09-01 Thread enrico d';urso
I tried it before, but still I am not able to see a proper round robin across 
the jobs I submit.
Given this:


FAIR
1
2
  

Each jobs inside production pool should be scheduled in round robin way, am I 
right?


From: Mark Hamstra 
Sent: Thursday, September 1, 2016 8:19:44 PM
To: enrico d'urso
Cc: user@spark.apache.org
Subject: Re: Spark scheduling mode

The default pool (``) can be configured like any other 
pool: 
https://spark.apache.org/docs/latest/job-scheduling.html#configuring-pool-properties

On Thu, Sep 1, 2016 at 11:11 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

Is there a way to force scheduling to be fair inside the default pool?
I mean, round robin for the jobs that belong to the default pool.

Cheers,


From: Mark Hamstra mailto:m...@clearstorydata.com>>
Sent: Thursday, September 1, 2016 7:24:54 PM
To: enrico d'urso
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark scheduling mode

Just because you've flipped spark.scheduler.mode to FAIR, that doesn't mean 
that Spark can magically configure and start multiple scheduling pools for you, 
nor can it know to which pools you want jobs assigned.  Without doing any setup 
of additional scheduling pools or assigning of jobs to pools, you're just 
dumping all of your jobs into the one available default pool (which is now 
being fair scheduled with an empty set of other pools) and the scheduling of 
jobs within that pool is still the default intra-pool scheduling, FIFO -- i.e., 
you've effectively accomplished nothing by only flipping spark.scheduler.mode 
to FAIR.

On Thu, Sep 1, 2016 at 7:10 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

I am building a Spark App, in which I submit several jobs (pyspark). I am using 
threads to run them in parallel, and also I am setting: 
conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run serially in 
FIFO way. Am I missing something?

Cheers,


Enrico




Re: Spark scheduling mode

2016-09-02 Thread enrico d';urso
Thank you.

May I know when that comparator is called?
It looks like spark scheduler has not any form of preemption, am I right?

Thank you


From: Mark Hamstra 
Sent: Thursday, September 1, 2016 8:44:10 PM
To: enrico d'urso
Cc: user@spark.apache.org
Subject: Re: Spark scheduling mode

Spark's FairSchedulingAlgorithm is not round robin: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala#L43

When at the scope of fair scheduling Jobs within a single Pool, the Schedulable 
entities being handled (s1 and s2) are TaskSetManagers, which are at the 
granularity of Stages, not Jobs.  Since weight is 1 and minShare is 0 for 
TaskSetManagers, the FairSchedulingAlgorithm for TaskSetManagers just boils 
down to prioritizing TaskSets (i.e. Stages) with the fewest number of 
runningTasks.

On Thu, Sep 1, 2016 at 11:23 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

I tried it before, but still I am not able to see a proper round robin across 
the jobs I submit.
Given this:


FAIR
1
2
  

Each jobs inside production pool should be scheduled in round robin way, am I 
right?


From: Mark Hamstra mailto:m...@clearstorydata.com>>
Sent: Thursday, September 1, 2016 8:19:44 PM
To: enrico d'urso
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark scheduling mode

The default pool (``) can be configured like any other 
pool: 
https://spark.apache.org/docs/latest/job-scheduling.html#configuring-pool-properties

On Thu, Sep 1, 2016 at 11:11 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

Is there a way to force scheduling to be fair inside the default pool?
I mean, round robin for the jobs that belong to the default pool.

Cheers,


From: Mark Hamstra mailto:m...@clearstorydata.com>>
Sent: Thursday, September 1, 2016 7:24:54 PM
To: enrico d'urso
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark scheduling mode

Just because you've flipped spark.scheduler.mode to FAIR, that doesn't mean 
that Spark can magically configure and start multiple scheduling pools for you, 
nor can it know to which pools you want jobs assigned.  Without doing any setup 
of additional scheduling pools or assigning of jobs to pools, you're just 
dumping all of your jobs into the one available default pool (which is now 
being fair scheduled with an empty set of other pools) and the scheduling of 
jobs within that pool is still the default intra-pool scheduling, FIFO -- i.e., 
you've effectively accomplished nothing by only flipping spark.scheduler.mode 
to FAIR.

On Thu, Sep 1, 2016 at 7:10 AM, enrico d'urso 
mailto:e.du...@live.com>> wrote:

I am building a Spark App, in which I submit several jobs (pyspark). I am using 
threads to run them in parallel, and also I am setting: 
conf.set("spark.scheduler.mode", "FAIR") Still, I see the jobs run serially in 
FIFO way. Am I missing something?

Cheers,


Enrico