On Mon, Dec 19, 2011 at 04:38:22PM +0100, Michael Hanselmann wrote:
> When an opcode is about to be processed its dependencies are
> evaluated using “_JobDependencyManager.CheckAndRegister”. Due
> to its nature that function requires a lock on the manager's
> internal structures. All of this happens while the job queue
> lock is held in shared mode (required for the job processor).
> 
> When a job has been processed any pending dependencies are re-added
> to the job workerpool. Before this patch that would require
> the manager's lock and then, for adding the jobs, the job queue
> lock. Since this is in reverse order it will lead to deadlocks.
> ---
>  lib/jqueue.py                  |   25 ++++++++++++++++++-------
>  test/ganeti.jqueue_unittest.py |    3 +++
>  2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/jqueue.py b/lib/jqueue.py
> index d5ea3cb..cc06b70 100644
> --- a/lib/jqueue.py
> +++ b/lib/jqueue.py
> @@ -1439,28 +1439,39 @@ class _JobDependencyManager:
>                " not one of '%s' as required" %
>                (dep_job_id, status, utils.CommaJoin(dep_status)))
>  
> -  @locking.ssynchronized(_LOCK)
> +  def _RemoveEmptyWaitersUnlocked(self):
> +    """Remove all jobs without actual waiters.
> +
> +    """
> +    for job_id in [job_id for (job_id, waiters) in self._waiters.items()
> +                   if not waiters]:
> +      del self._waiters[job_id]
> +
>    def NotifyWaiters(self, job_id):
>      """Notifies all jobs waiting for a certain job ID.
>  
> +    @important: Do not call until L{CheckAndRegister} returned a status other
> +      than L{self.WAIT} for C{job_id}, or behaviour is undefined

I'm sorry, is this really serious? We deadlock simply due to calling
functions in the wrong state??

iustin

Reply via email to