On 12/21/2012 09:53 PM, Jaroslav Kortus wrote:
Furthermore, the scheduler would be updated to work on a *cached* copy
of the System
status data. This is needed to avoid the current problem where there's
a race condition
with system status changes occurring during a scheduling pass leading
to recipes jumping
the queue (I'm interested in hearing about relatively clean ways to
this with SQL Alchemy,
though:
http://stackoverflow.com/questions/13983067/cached-reads-immediate-writes-with-sql-alchemy)



Do you mean https://bugzilla.redhat.com/show_bug.cgi?id=872187 ?

I've worked around it using the transaction for the whole scheduling
loop (not just per-recipe as it was before). This is sort of "cached"
data, as the transactions reads are consistent no matter what got
written during the transaction. I'm ordering the recipes by recipe set
priority, recipeset.id and recipe.id (in this order) and then they get
scheduled.

I haven't seen any deadlocks since then (with tasks having the same
priority).

Yeah, but lumping everything into one giant transaction has its own problems (mainly to do with state consistency with external systems like RHEV-M and the filesystem).

What I realised over the Christmas break is that many of these problems can be resolved by moving towards a more event based scheduling system, with two key scheduling events:

1. When a new recipe is submitted, attempt to assign it to a dynamic virtual system or to a system from the idle pool 2. When a system completes its current task, attempt to assign it a recipe from the queue *before* placing it back in the idle pool (in the case of dynamic virt, see if any of the recipes that previously failed dynamic virt allocation can now be allocated a dynamic VM)

The current scheduling loop would then become a cleanup loop (e.g. looking for dead recipes that need to be aborted for various reasons)

With separate scheduling events, different prioritisation rules can be applied to the two kinds of scheduling:

New recipes would use the current recipe based scheduling: filter and order the available systems according to the preferences of the user submitting the job and the requirements expressed in the recipe.

Free systems would use system based scheduling: order the queued recipes according to the preferences of the system owner and the priorities of the queued recipes.

The latter scheduling algorithm can deal with the deadlock problem by prioritising recipes that are part of a recipe set that already has some resources allocated on the relevant lab controller over those which are just part of the general queue.

The one downside is that users would be able to exploit this is order to jump the queue for access to rare resources by also scheduling a recipe in the same recipe set that can run on readily available hardware. While we likely can't prevent such abuse, we should be able to provide tools to help detect it and leave it up to organizational "acceptable use" policies to deal with it.

Cheers,
Nick.

--
Nick Coghlan
Red Hat Infrastructure Engineering & Development, Brisbane

Python Applications Team Lead
Beaker Development Lead (http://beaker-project.org/)
GlobalSync Development Lead (http://pulpdist.readthedocs.org)
_______________________________________________
Beaker-devel mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/beaker-devel

Reply via email to