Hi Marlon,

Thanks for the clarification. Yes I agree that we need to re-evaluate both
the use case and the approach we are going to implement this.

Hi All,

Considering current requirements of gateways, we somehow need a better job
throttling/ scheduling proxy before the actual schedulers. I will simply
copy and paste the existing use cases mentioned in the wiki and please
share your views on the validity of these use cases in current gateways.
Once we have come up with a concrete set of requirements, then let's think
about the technical side of the solution. I have added my point of view on
each usecase in comments section.

*Metascheduling Usecase 1: Users/Gateways submitting a series of jobs to a
resource. Whereas a resource enforces a per user job limit within a queue
and ensure fair use of the clusters ((example: stampede allows 50 jobs per
user in the normal queue). Airavata will need to implement queues and
throttle jobs respecting the max-job-per-queue limits of a underlying
resource queue. *

Comments:
1. What will happen to the pending jobs in the Airavata queue? Will they
stay in the queue forever until the queues in the clusters become available
or is there an expiry?
2. Are we assuming that Airavata is the only entry point for the queues in
the clusters or can there be some other gateways/ independent users
submitting jobs to the same queue?

*Metascheduling Usecase 2: Users/Gateways delegate job scheduling across
available computational resources to Airavata. Airavata will need to
implement schedulers which become aware of existing loads on the clusters
and spread jobs efficiently. The scheduler should also be able to get
access to heuristics on previous executions and current requirements which
includes job size (number of nodes/cores), memory requirements, wall time
estimates and so forth. *

Comments:
1. Even though the first part makes sense for me, second part looks more
like a machine learning problem and a nice to have a feature. Is this
something like user comes into portal and launch an application with a set
of inputs and Airavata decides in which machine / cluster should this job
run?

*Metascheduling Usecase 3: Users within a gateway need to fairly use a
community account. Computational resources like XSEDE enforce fair-share
across users, but since gateway job submissions are funneled through a
single community account, different users within a gateway are impacted.
Airavata will need to implement fair-share scheduling among these users to
ensure fair use of allocations as well as allowable queue limits and work
with resources policies.*

Comments
1. Is there any existing work regarding this usecase? I can remember that
two students created a UI to enforce these limits and save in a database.

Thanks
Dimuthu

On Mon, Dec 17, 2018 at 7:22 PM Pierce, Marlon <[email protected]> wrote:

> Hi Dimuthu,
>
>
>
> This is something we should re-evaluate. Mangirish Wangle looked at Mesos
> integration with Airavata back in 2016, but he ultimately ran into many
> difficulties, including getting MPI jobs to work, if I recall correctly.
>
>
>
> Marlon
>
>
>
>
>
> *From: *"[email protected]" <[email protected]>
> *Reply-To: *dev <[email protected]>
> *Date: *Sunday, December 16, 2018 at 7:30 AM
> *To: *dev <[email protected]>
> *Subject: *Metascheduler work
>
>
>
> Hi Folks,
>
>
>
> I found this [1] mail thread and the JIRA ticket [2] which have discussed
> about coming up with an Airavata specific job scheduler. At the end of the
> discussion, seems like an approach based on Mesos has been chosen to
> tryout. Is there any other discussion/ documents regarding this topic? Has
> anyone worked on this and if so, where are the code / design documents?
>
>
>
> [1]
> https://markmail.org/message/tdae5y3togyq4duv#query:+page:1+mid:tdae5y3togyq4duv+state:results
>
> [2] https://issues.apache.org/jira/browse/AIRAVATA-1436
>
>
>
> Thanks
>
> Dimuthu
>

Reply via email to