On Thu, Feb 24, 2022 at 01:45:48PM +0100, Emanuele Giuseppe Esposito wrote: > > > On 17/02/2022 15:48, Stefan Hajnoczi wrote: > > On Tue, Feb 08, 2022 at 09:35:01AM -0500, Emanuele Giuseppe Esposito wrote: > >> diff --git a/block/replication.c b/block/replication.c > >> index 55c8f894aa..a03b28726e 100644 > >> --- a/block/replication.c > >> +++ b/block/replication.c > >> @@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs) > >> if (s->stage == BLOCK_REPLICATION_FAILOVER) { > >> commit_job = &s->commit_job->job; > >> assert(commit_job->aio_context == qemu_get_current_aio_context()); > > > > Is it safe to access commit_job->aio_context outside job_mutex? > > No, but it is currently not done. Patch 18 takes care of protecting > aio_context. Remember again that job lock API is still nop. > > > >> @@ -1838,7 +1840,9 @@ static void drive_backup_abort(BlkActionState > >> *common) > >> aio_context = bdrv_get_aio_context(state->bs); > >> aio_context_acquire(aio_context); > >> > >> - job_cancel_sync(&state->job->job, true); > >> + WITH_JOB_LOCK_GUARD() { > >> + job_cancel_sync(&state->job->job, true); > >> + } > > > > Maybe job_cancel_sync() should take the lock internally since all > > callers in this patch seem to need the lock? > > The _locked version is useful because it is used when lock guards are > already present, and cover multiple operations. There are only 3 places > where a lock guard is added to cover job_cance_sync_locked. Is it worth > defining another additional function? > > > > > > I noticed this patch does not add WITH_JOB_LOCK_GUARD() to > > tests/unit/test-blockjob.c:cancel_common(). Was that an oversight or is > > there a reason why job_mutex is not needed around the job_cancel_sync() > > call there? > > No, locks in unit tests are added in patch 10 "jobs: protect jobs with > job_lock/unlock".
I see, it's a question of how to split up the patches. When patches leave the code in a state with broken invariants it becomes difficult to review. I can't distinguish between actual bugs and temporary violations that will be fixed in a later patch (unless they are clearly marked). If you can structure patches so they are self-contained and don't leave the broken invariants then that would make review easier, but in this case it is tricky so I'll do the best I can to review it if you cannot restructure the sequence of commits. > > > >> @@ -252,7 +258,13 @@ int block_job_add_bdrv(BlockJob *job, const char > >> *name, BlockDriverState *bs, > >> > >> static void block_job_on_idle(Notifier *n, void *opaque) > >> { > >> + /* > >> + * we can't kick with job_mutex held, but we also want > >> + * to protect the notifier list. > >> + */ > >> + job_unlock(); > >> aio_wait_kick(); > >> + job_lock(); > > > > I don't understand this. aio_wait_kick() looks safe to call with a mutex > > held? > You are right. It should be safe. > > > > >> @@ -292,7 +304,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, > >> Error **errp) > >> job->speed = speed; > >> > >> if (drv->set_speed) { > >> + job_unlock(); > >> drv->set_speed(job, speed); > >> + job_lock(); > > > > What guarantees that job stays alive during drv->set_speed(job)? We > > don't hold a ref here. Maybe the assumption is that > > block_job_set_speed() only gets called from the main loop thread and > > nothing else will modify the jobs list while we're in drv->set_speed()? > > What guaranteed this before? I am not sure. I guess the reason is the one I suggested. It should be documented in the comments. > > > > >> @@ -545,10 +566,15 @@ BlockErrorAction block_job_error_action(BlockJob > >> *job, BlockdevOnError on_err, > >> action); > >> } > >> if (action == BLOCK_ERROR_ACTION_STOP) { > >> - if (!job->job.user_paused) { > >> - job_pause(&job->job); > >> - /* make the pause user visible, which will be resumed from > >> QMP. */ > >> - job->job.user_paused = true; > >> + WITH_JOB_LOCK_GUARD() { > >> + if (!job->job.user_paused) { > >> + job_pause(&job->job); > >> + /* > >> + * make the pause user visible, which will be > >> + * resumed from QMP. > >> + */ > >> + job->job.user_paused = true; > >> + } > >> } > >> block_job_iostatus_set_err(job, error); > > > > Does this need the lock? If not, why is block_job_iostatus_reset() > > called with the hold? > > > block_job_iostatus_set_err does not touch any Job fields. On the other > hand block_job_iostatus_reset reads job.user_paused and job.pause_count. BlockJob->iostatus requires no locking?
signature.asc
Description: PGP signature