1. Yes if two tasks depend on each other they cant parallelize
2. Imagine something like a web application driver. You only get to have 1 
spark context but now you want to run many concurrent jobs. They have nothing 2 
do with each other; no reason to keep them sequential. 

Hope this helps

<div>-------- Original message --------</div><div>From: bit1...@163.com 
</div><div>Date:06/01/2015  4:14 AM  (GMT-05:00) </div><div>To: user 
<user@spark.apache.org> </div><div>Subject: Don't understand "schedule jobs 
within an Application </div><div>
</div>Hi, sparks,

Following is copied from the spark online document 
http://spark.apache.org/docs/latest/job-scheduling.html. 

Basically, I have two questions on it:

1. If two jobs in an application has dependencies, that is one job depends on 
the result of the other job, then I think they will have to run sequentially.
2. Since jobs scheduling happens within one application, I don't think job 
scheduing will give benefits to  multi-users as the last sentence says.in  my 
opinion, multi users can benifit only from cross applications scheduling.

Maybe i haven't had a good understanding on the job scheduing, could someone 
elaborate this? Thanks very much






By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided 
into “stages” (e.g. map and reduce phases), and the first job gets priority on 
all available resources while its stages have tasks to launch, then the second 
job gets priority, etc. If the jobs at the head of the queue don’t need to use 
the whole cluster, later jobs can start to run right away, but if the jobs at 
the head of the queue are large, then later jobs may be delayed significantly.
Starting in Spark 0.8, it is also possible to configure fair sharing between 
jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” 
fashion, so that all jobs get a roughly equal share of cluster resources. This 
means that short jobs submitted while a long job is running can start receiving 
resources right away and still get good response times, without waiting for the 
long job to finish. This mode is best for multi-user settings


bit1...@163.com

Reply via email to