Status: Accepted
Owner: ----
CC: [email protected],  [email protected]
Labels: Type-Other Milestone-Release2.12 Priority-High

New issue 747 by [email protected]: Honor job priorities when handing out locks
http://code.google.com/p/ganeti/issues/detail?id=747

The current design[1] of lock handling in 2.12 does not take job
priorities into account; locks are handed out to the first job
asking. However, it is desirable to hand out locks to the most
important job waiting for it.

This will require wconfd to keep track of the waiting jobs; in fact,
this is necessary anyway, for at least two reasons:
- it is a queriable property of the cluster, and
- unnecessary polling can be avoided.

While adding this functionality into wconfd is not hard, the design
needs to clarify how a job is informed that its locks get available.
There are at least two ways this can be done.

(a) Support blocking lock allocation, where a request is only
    answered once the locks are available. This form of implementation
    is hinted on in the current design, but certain changes are
    necessary.
    - The timeout for the connections to wconfd has to be increased
      significantly, as not all locks are granted within 60s.
    - We have to ensure that the number of open connections to wconfd
      does not become a bottle neck. With the default limitation to
      25 jobs running at the same time, this should not be a problem,
      but it would be a limit to scaling Ganeti to larger clusters with
      more jobs.

(b) Support asynchronous lock allocation: a job can ask wconfd to allocate
    a given set of locks when they are ready (and wconfd will honor the
    job priority). The answer to such a request would signal one of two
    possible outcomes: "the locks are available right now", "your request
    is in the waiting list". In the latter case, WConfd would then send
    a signal to the job once the locks are ready; the job would query the
    locks it owns to verify the meaning of the signal. To avoid any danger
    by lost signals, the job would additionally poll with "listlocks" at
    very low frequency (like once every 2 minutes).

Before the release of 2.12, we need to decide
- which form of notification to be used in 2.12
  (this also needs to be implemented), and
- which form of notification to use in the long run.


[1] http://docs.ganeti.org/ganeti/master/html/design-daemons.html

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply via email to