Re: [PATCH for-5.0 v5 1/3] job: take each job's lock individually in job_txn_apply

Kevin Wolf Tue, 07 Apr 2020 06:27:26 -0700

Am 07.04.2020 um 13:56 hat Stefan Reiter geschrieben:
> All callers of job_txn_apply hold a single job's lock, but different
> jobs within a transaction can have different contexts, thus we need to
> lock each one individually before applying the callback function.
> 
> Similar to job_completed_txn_abort this also requires releasing the
> caller's context before and reacquiring it after to avoid recursive
> locks which might break AIO_WAIT_WHILE in the callback. This is safe, since
> existing code would already have to take this into account, lest
> job_completed_txn_abort might have broken.
> 
> This also brings to light a different issue: When a callback function in
> job_txn_apply moves it's job to a different AIO context, callers will
> try to release the wrong lock (now that we re-acquire the lock
> correctly, previously it would just continue with the old lock, leaving
> the job unlocked for the rest of the return path). Fix this by not caching
> the job's context.
> 
> This is only necessary for qmp_block_job_finalize, qmp_job_finalize and
> job_exit, since everyone else calls through job_exit.


job_cancel() doesn't go through job_exit(). It calls job_completed()
onyl for jobs that are not started yet, and it sets job->cancelled, so
that job_completed() takes the job_completed_txn_abort() path, which is
not changed by this patch.

I _think_ this is okay, but it shows that the whole job completion
infrastructure is becoming way too complicated. We're late for 5.0, so
let's take this patch for now, but I think we should use the 5.1 release
cycle to clean up this mess a bit.

> One test needed adapting, since it calls job_finalize directly, so it
> manually needs to acquire the correct context.
> 
> Signed-off-by: Stefan Reiter <s.rei...@proxmox.com>

Kevin

Re: [PATCH for-5.0 v5 1/3] job: take each job's lock individually in job_txn_apply

Reply via email to