On Wed, Jul 07, 2021 at 06:58:07PM +0200, Emanuele Giuseppe Esposito wrote: > This is a continuation on the work to reduce (and possibly get rid of) the > usage of AioContext lock, by introducing smaller granularity locks to keep > the thread safety. > > This series aims to: > 1) remove the aiocontext lock and substitute it with the already existing > global job_mutex > 2) fix what it looks like to be an oversight when moving the blockjob.c logic > into the more generic job.c: job_mutex was introduced especially to > protect job->busy flag, but it seems that it was not used in successive > patches, because there are multiple code sections that directly > access the field without any locking. > 3) use job_mutex instead of the aiocontext_lock > 4) extend the reach of the job_mutex to protect all shared fields > that the job structure has. > > The reason why we propose to use the existing job_mutex and not make one for > each job is to keep things as simple as possible for now, and because > the jobs are not in the execution critical path, so we can affort > some delays. > Having a lock per job would increase overall complexity and > increase the chances of deadlocks (one good example could be the job > transactions, where multiple jobs are grouped together). > Anyways, the per-job mutex can always be added in the future. > > Patch 1-4 are in preparation for patch 5. They try to simplify and clarify > the job_mutex usage. Patch 5 tries to add proper syncronization to the job > structure, replacing the AioContext lock when necessary. > Patch 6 just removes unnecessary AioContext locks that are now unneeded. > > > RFC: I am not sure the way I layed out the locks is ideal. > But their usage should not make deadlocks. I also made sure > the series passess all qemu_iotests. > > What is very clear from this patch is that it > is strictly related to the brdv_* and lower level calls, because > they also internally check or even use the aiocontext lock. > Therefore, in order to make it work, I temporarly added some > aiocontext_acquire/release pair around the function that > still assert for them or assume they are hold and temporarly > unlock (unlock() - lock()).
Sounds like the issue is that this patch series assumes AioContext locks are no longer required for calling the blk_*()/bdrv_*() APIs? That is not the case yet, so you had to then add those aio_context_lock() calls back in elsewhere. This approach introduces unnecessary risk. I think we should wait until blk_*()/bdrv_*() no longer requires the caller to hold the AioContext lock before applying this series. Stefan
signature.asc
Description: PGP signature