Hi all, I have been working with Dimuthu on a Meta-Scheduler for Airavata. And until now I have sketched out a few things that I wanted to share with the broader community and get some feedback.
As you know that Apache Airavata currently does not have a scheduler that intelligently schedules jobs before submitting to compute resources (clusters). In this context there are a few main functionalities that can be provided: (1) throttling jobs before submitting to compute resources, (2) be aware of the load on various clusters and intelligently dispatch series of user jobs to multiple clusters thereby increasing throughput and (3) making it easy and fair for multiple users to use a single community account. To achieve (1) and (3) I think we need to have two limits on users. One is number of jobs that they can submit that are in the queue and the other limit is on the number of jobs that the Airavata Scheduler allows to go to execution (i.e. they are dispatched to physical cluster). To achieve (2) I have read a bit on literature about batch job scheduling on clusters. Following are ways that guide scheduling policies: - FCFS (First come first serve) - SJF (Shortest job first) - LJF(Longest job first) - Advance reservation —> can we do this? Since at the physical machine level it will also have its own scheduler? - Backfilling* - Preemptive backfilling —> let high priority jobs be given precedence over lower *Backfill: Each job has a start time as well as a wall clock limit. Therefore, finish time can be found for all the jobs that are in the queue along with the earliest time the resources are going to be free in order to start the high priority jobs. Backfill uses this knowledge to give resources to jobs that can complete in between the waiting and makes the most out of the available resources. I think, with the Airavata Meta-Scheduler, we have a unique challenge that is not present in schedulers for clusters. It is that the Meta-Scheduler for Airavata is going to piggyback on the scheduler of the physical cluster. In that case we don't have control over the scheduling since the cluster will have its own policies and therefore we cannot guarantee good accuracy in terms of when our jobs finish (or when do they really start to begin with). So, how aggressively should we do scheduling? Is it worth it to be more aggressive? Also I anticipate following four statuses a Job cab be in: 1 - Airavata Queue 2 - Cluster Queue 3 - Executing 4 - Stopped To get started and write the scheduler I have written a little cluster simulator. It takes in jobs and puts them in queue and later based on FCFS basis executes them. This infrastructure will later be used to develop the meta-scheduler for airavata by plugging in the meta-scheduler between loadGenerator and clusterSimulator. Once the meta-scheduler is perfected we then integrate it into airavata repository. The code base is sent as a pull request to airavata-sandbox repo: https://github.com/apache/airavata-sandbox/pull/40 Looking forward to your feedback! Bibrak Qamar
