18.09.2019 23:31, John Snow wrote: > > > On 9/10/19 9:23 AM, John Snow wrote: >> >> >> On 9/10/19 4:19 AM, Stefan Hajnoczi wrote: >>> On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote: >>>> >>>> >>>> On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> 09.08.2019 23:13, John Snow wrote: >>>>>> Backup jobs may yield prior to installing their handler, because of the >>>>>> job_co_entry shim which guarantees that a job won't begin work until >>>>>> we are ready to start an entire transaction. >>>>>> >>>>>> Unfortunately, this makes proving correctness about transactional >>>>>> points-in-time for backup hard to reason about. Make it explicitly clear >>>>>> by moving the handler registration to creation time, and changing the >>>>>> write notifier to a no-op until the job is started. >>>>>> >>>>>> Reported-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>>>>> Signed-off-by: John Snow <js...@redhat.com> >>>>>> --- >>>>>> block/backup.c | 32 +++++++++++++++++++++++--------- >>>>>> include/qemu/job.h | 5 +++++ >>>>>> job.c | 2 +- >>>>>> 3 files changed, 29 insertions(+), 10 deletions(-) >>>>>> >>>>>> diff --git a/block/backup.c b/block/backup.c >>>>>> index 07d751aea4..4df5b95415 100644 >>>>>> --- a/block/backup.c >>>>>> +++ b/block/backup.c >>>>>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify( >>>>>> assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE)); >>>>>> assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE)); >>>>>> >>>>>> + /* The handler is installed at creation time; the actual >>>>>> point-in-time >>>>>> + * starts at job_start(). Transactions guarantee those two points >>>>>> are >>>>>> + * the same point in time. */ >>>>>> + if (!job_started(&job->common.job)) { >>>>>> + return 0; >>>>>> + } >>>>> >>>>> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing >>>>> and in >>>>> Qemu iothreads.. >>>>> >>>>> job_started just reads job->co. If bs runs in iothread, and therefore >>>>> write-notifier >>>>> is in iothread, when job_start is called from main thread.. Is it >>>>> guaranteed that >>>>> write-notifier will see job->co variable change early enough to not miss >>>>> guest write? >>>>> Should not job->co be volatile for example or something like this? >>>>> >>>>> If not think about this patch looks good for me. >>>>> >>>> >>>> You know, it's a really good question. >>>> So good, in fact, that I have no idea. >>>> >>>> ¯\_(ツ)_/¯ >>>> >>>> I'm fairly certain that IO will not come in until the .clean phase of a >>>> qmp_transaction, because bdrv_drained_begin(bs) is called during >>>> .prepare, and we activate the handler (by starting the job) in .commit. >>>> We do not end the drained section until .clean. >>>> >>>> I'm not fully clear on what threading guarantees we have otherwise, >>>> though; is it possible that "Thread A" would somehow lift the bdrv_drain >>>> on an IO thread ("Thread B") and, after that, "Thread B" would somehow >>>> still be able to see an outdated version of job->co that was set by >>>> "Thread A"? >>>> >>>> I doubt it; but I can't prove it. >>> >>> In the qmp_backup() case (not qmp_transaction()) there is: >>> >>> void qmp_drive_backup(DriveBackup *arg, Error **errp) >>> { >>> >>> BlockJob *job; >>> job = do_drive_backup(arg, NULL, errp); >>> if (job) { >>> job_start(&job->job); >>> } >>> } >>> >>> job_start() is called without any thread synchronization, which is >>> usually fine because the coroutine doesn't run until job_start() calls >>> aio_co_enter(). >>> >>> Now that the before write notifier has been installed early, there is >>> indeed a race between job_start() and the write notifier accessing >>> job->co from an IOThread. >>> >>> The write before notifier might see job->co != NULL before job_start() >>> has finished. This could lead to issues if job_*() APIs are invoked by >>> the write notifier and access an in-between job state. >>> >> >> I see. I think in this case, as long as it sees != NULL, that the >> notifier is actually safe to run. I agree that this might be confusing >> to verify and could bite us in the future. The worry we had, too, is >> more the opposite: will it see NULL for too long? We want to make sure >> that it is registering as true *before the first yield*. >> >>> A safer approach is to set a BackupBlockJob variable at the beginning of >>> backup_run() and check it from the before write notifier. >>> >> >> That's too late, for reasons below. >> >>> That said, I don't understand the benefit of this patch and IMO it makes >>> the code harder to understand because now we need to think about the >>> created but not started state too. >>> >>> Stefan >>> >> >> It's always possible I've hyped myself up into believing there's a >> problem where there isn't one, but the fear is this: >> >> The point in time from a QMP transaction covers the job creation and the >> job start, but when we start the job it will actually yield before we >> get to backup_run -- and there is no guarantee that the handler will get >> installed synchronously, so the point in time ends before the handler >> activates. >> > > i.e., the handler might get installed AFTER the critical region of a > transaction. We could drop initial writes if we were unlucky. > > (I think.) > >> The yield occurs in job_co_entry as an intentional feature of forcing a >> yield and pause point at run time -- so it's harder to write a job that >> accidentally hogs the thread during initialization. >> >> This is an attempt to get the handler installed earlier to ensure the >> point of time stays synchronized with creation time to provide a >> stronger transactional guarantee. >> > > Squeaky wheel gets the grease. Any comment? >
Hmm, this all becomes difficult, I'd prefer to not worry and wait for backup-top filter applied. -- Best regards, Vladimir