Am 07.04.2020 um 13:56 hat Stefan Reiter geschrieben: > All callers of job_txn_apply hold a single job's lock, but different > jobs within a transaction can have different contexts, thus we need to > lock each one individually before applying the callback function. > > Similar to job_completed_txn_abort this also requires releasing the > caller's context before and reacquiring it after to avoid recursive > locks which might break AIO_WAIT_WHILE in the callback. This is safe, since > existing code would already have to take this into account, lest > job_completed_txn_abort might have broken. > > This also brings to light a different issue: When a callback function in > job_txn_apply moves it's job to a different AIO context, callers will > try to release the wrong lock (now that we re-acquire the lock > correctly, previously it would just continue with the old lock, leaving > the job unlocked for the rest of the return path). Fix this by not caching > the job's context. > > This is only necessary for qmp_block_job_finalize, qmp_job_finalize and > job_exit, since everyone else calls through job_exit.
job_cancel() doesn't go through job_exit(). It calls job_completed() onyl for jobs that are not started yet, and it sets job->cancelled, so that job_completed() takes the job_completed_txn_abort() path, which is not changed by this patch. I _think_ this is okay, but it shows that the whole job completion infrastructure is becoming way too complicated. We're late for 5.0, so let's take this patch for now, but I think we should use the 5.1 release cycle to clean up this mess a bit. > One test needed adapting, since it calls job_finalize directly, so it > manually needs to acquire the correct context. > > Signed-off-by: Stefan Reiter <s.rei...@proxmox.com> Kevin