Re: [Qemu-block] [Qemu-devel] backup bug or question
12.08.2019 20:46, John Snow wrote: > > > On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote: >> 09.08.2019 23:13, John Snow wrote: >>> >>> >>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote: Hi! Hmm, hacking around backup I have a question: What prevents guest write request after job_start but before setting write notifier? code path: qmp_drive_backup or transaction with backup job_start aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */ job_co_entry job_pause_point() /* it definitely yields, isn't it bad? */ job->driver->run() /* backup_run */ backup_run() bdrv_add_before_write_notifier() ... >>> >>> I think you're right... :( >>> >>> >>> We create jobs like this: >>> >>> job->paused= true; >>> job->pause_count = 1; >>> >>> >>> And then job_start does this: >>> >>> job->co = qemu_coroutine_create(job_co_entry, job); >>> job->pause_count--; >>> job->busy = true; >>> job->paused = false; >>> >>> >>> Which means that job_co_entry is being called before we lift the pause: >>> >>> assert(job && job->driver && job->driver->run); >>> job_pause_point(job); >>> job->ret = job->driver->run(job, &job->err); >>> >>> ...Which means that we are definitely yielding in job_pause_point. >>> >>> Yeah, that's a race condition waiting to happen. >>> And what guarantees we give to the user? Is it guaranteed that write notifier is set when qmp command returns? And I guess, if we start several backups in a transaction it should be guaranteed that the set of backups is consistent and correspond to one point in time... >>> >>> I would have hoped that maybe the drain_all coupled with the individual >>> jobs taking drain_start and drain_end would save us, but I guess we >>> simply don't have a guarantee that all backup jobs WILL have installed >>> their handler by the time the transaction ends. >>> >>> Or, if there is that guarantee, I don't know what provides it, so I >>> think we shouldn't count on it accidentally working anymore. >>> >>> >>> >>> I think we should do two things: >>> >>> 1. Move the handler installation to creation time. >>> 2. Modify backup_before_write_notify to return without invoking >>> backup_do_cow if the job isn't started yet. >>> >> >> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from >> guest write, is it? >> >> > > The idea is that by installing the write notifier during creation, the > write notifier can be switched on the instant job_start is created, > regardless of if we yield in the co_entry shim or not. > > That way, no matter when we yield or when the backup_run coroutine > actually gets scheduled and executed, the write notifier is active already. > > Or put another way: calling job_start() guarantees that the write > notifier is active. Oh, got it, feel stupid) > > I think using filters will save us too, but I don't know how ready those > are. Do we still want a patch that guarantees this behavior in the meantime? > I think we want of course, will look at it tomorrow. -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] backup bug or question
On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote: > 09.08.2019 23:13, John Snow wrote: >> >> >> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Hi! >>> >>> Hmm, hacking around backup I have a question: >>> >>> What prevents guest write request after job_start but before setting >>> write notifier? >>> >>> code path: >>> >>> qmp_drive_backup or transaction with backup >>> >>> job_start >>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it >>> ? */ >>> >>> >>> >>> job_co_entry >>> job_pause_point() /* it definitely yields, isn't it bad? */ >>> job->driver->run() /* backup_run */ >>> >>> >>> >>> backup_run() >>> bdrv_add_before_write_notifier() >>> >>> ... >>> >> >> I think you're right... :( >> >> >> We create jobs like this: >> >> job->paused= true; >> job->pause_count = 1; >> >> >> And then job_start does this: >> >> job->co = qemu_coroutine_create(job_co_entry, job); >> job->pause_count--; >> job->busy = true; >> job->paused = false; >> >> >> Which means that job_co_entry is being called before we lift the pause: >> >> assert(job && job->driver && job->driver->run); >> job_pause_point(job); >> job->ret = job->driver->run(job, &job->err); >> >> ...Which means that we are definitely yielding in job_pause_point. >> >> Yeah, that's a race condition waiting to happen. >> >>> And what guarantees we give to the user? Is it guaranteed that write >>> notifier is >>> set when qmp command returns? >>> >>> And I guess, if we start several backups in a transaction it should be >>> guaranteed >>> that the set of backups is consistent and correspond to one point in time... >>> >> >> I would have hoped that maybe the drain_all coupled with the individual >> jobs taking drain_start and drain_end would save us, but I guess we >> simply don't have a guarantee that all backup jobs WILL have installed >> their handler by the time the transaction ends. >> >> Or, if there is that guarantee, I don't know what provides it, so I >> think we shouldn't count on it accidentally working anymore. >> >> >> >> I think we should do two things: >> >> 1. Move the handler installation to creation time. >> 2. Modify backup_before_write_notify to return without invoking >> backup_do_cow if the job isn't started yet. >> > > Hmm, I don't see, how it helps.. No-op write-notifier will not save as from > guest write, is it? > > The idea is that by installing the write notifier during creation, the write notifier can be switched on the instant job_start is created, regardless of if we yield in the co_entry shim or not. That way, no matter when we yield or when the backup_run coroutine actually gets scheduled and executed, the write notifier is active already. Or put another way: calling job_start() guarantees that the write notifier is active. I think using filters will save us too, but I don't know how ready those are. Do we still want a patch that guarantees this behavior in the meantime? --js
Re: [Qemu-block] [Qemu-devel] backup bug or question
09.08.2019 23:13, John Snow wrote: > > > On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote: >> Hi! >> >> Hmm, hacking around backup I have a question: >> >> What prevents guest write request after job_start but before setting >> write notifier? >> >> code path: >> >> qmp_drive_backup or transaction with backup >> >> job_start >> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it >> ? */ >> >> >> >> job_co_entry >> job_pause_point() /* it definitely yields, isn't it bad? */ >> job->driver->run() /* backup_run */ >> >> >> >> backup_run() >> bdrv_add_before_write_notifier() >> >> ... >> > > I think you're right... :( > > > We create jobs like this: > > job->paused= true; > job->pause_count = 1; > > > And then job_start does this: > > job->co = qemu_coroutine_create(job_co_entry, job); > job->pause_count--; > job->busy = true; > job->paused = false; > > > Which means that job_co_entry is being called before we lift the pause: > > assert(job && job->driver && job->driver->run); > job_pause_point(job); > job->ret = job->driver->run(job, &job->err); > > ...Which means that we are definitely yielding in job_pause_point. > > Yeah, that's a race condition waiting to happen. > >> And what guarantees we give to the user? Is it guaranteed that write >> notifier is >> set when qmp command returns? >> >> And I guess, if we start several backups in a transaction it should be >> guaranteed >> that the set of backups is consistent and correspond to one point in time... >> > > I would have hoped that maybe the drain_all coupled with the individual > jobs taking drain_start and drain_end would save us, but I guess we > simply don't have a guarantee that all backup jobs WILL have installed > their handler by the time the transaction ends. > > Or, if there is that guarantee, I don't know what provides it, so I > think we shouldn't count on it accidentally working anymore. > > > > I think we should do two things: > > 1. Move the handler installation to creation time. > 2. Modify backup_before_write_notify to return without invoking > backup_do_cow if the job isn't started yet. > Hmm, I don't see, how it helps.. No-op write-notifier will not save as from guest write, is it? -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] backup bug or question
On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote: > Hi! > > Hmm, hacking around backup I have a question: > > What prevents guest write request after job_start but before setting > write notifier? > > code path: > > qmp_drive_backup or transaction with backup > > job_start >aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? > */ > > > > job_co_entry > job_pause_point() /* it definitely yields, isn't it bad? */ > job->driver->run() /* backup_run */ > > > > backup_run() > bdrv_add_before_write_notifier() > > ... > I think you're right... :( We create jobs like this: job->paused= true; job->pause_count = 1; And then job_start does this: job->co = qemu_coroutine_create(job_co_entry, job); job->pause_count--; job->busy = true; job->paused = false; Which means that job_co_entry is being called before we lift the pause: assert(job && job->driver && job->driver->run); job_pause_point(job); job->ret = job->driver->run(job, &job->err); ...Which means that we are definitely yielding in job_pause_point. Yeah, that's a race condition waiting to happen. > And what guarantees we give to the user? Is it guaranteed that write notifier > is > set when qmp command returns? > > And I guess, if we start several backups in a transaction it should be > guaranteed > that the set of backups is consistent and correspond to one point in time... > I would have hoped that maybe the drain_all coupled with the individual jobs taking drain_start and drain_end would save us, but I guess we simply don't have a guarantee that all backup jobs WILL have installed their handler by the time the transaction ends. Or, if there is that guarantee, I don't know what provides it, so I think we shouldn't count on it accidentally working anymore. I think we should do two things: 1. Move the handler installation to creation time. 2. Modify backup_before_write_notify to return without invoking backup_do_cow if the job isn't started yet. I'll send a patch in just a moment ... --js