Re: [Qemu-block] [Qemu-devel] backup bug or question

2019-08-12 Thread Vladimir Sementsov-Ogievskiy
12.08.2019 20:46, John Snow wrote:
> 
> 
> On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 23:13, John Snow wrote:
>>>
>>>
>>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
 Hi!

 Hmm, hacking around backup I have a question:

 What prevents guest write request after job_start but before setting
 write notifier?

 code path:

 qmp_drive_backup or transaction with backup

   job_start
  aio_co_enter(job_co_entry) /* may only schedule execution, isn't 
 it ? */

 

 job_co_entry
   job_pause_point() /* it definitely yields, isn't it bad? */
   job->driver->run() /* backup_run */

 

 backup_run()
   bdrv_add_before_write_notifier()

 ...

>>>
>>> I think you're right... :(
>>>
>>>
>>> We create jobs like this:
>>>
>>> job->paused= true;
>>> job->pause_count   = 1;
>>>
>>>
>>> And then job_start does this:
>>>
>>> job->co = qemu_coroutine_create(job_co_entry, job);
>>> job->pause_count--;
>>> job->busy = true;
>>> job->paused = false;
>>>
>>>
>>> Which means that job_co_entry is being called before we lift the pause:
>>>
>>> assert(job && job->driver && job->driver->run);
>>> job_pause_point(job);
>>> job->ret = job->driver->run(job, &job->err);
>>>
>>> ...Which means that we are definitely yielding in job_pause_point.
>>>
>>> Yeah, that's a race condition waiting to happen.
>>>
 And what guarantees we give to the user? Is it guaranteed that write 
 notifier is
 set when qmp command returns?

 And I guess, if we start several backups in a transaction it should be 
 guaranteed
 that the set of backups is consistent and correspond to one point in 
 time...

>>>
>>> I would have hoped that maybe the drain_all coupled with the individual
>>> jobs taking drain_start and drain_end would save us, but I guess we
>>> simply don't have a guarantee that all backup jobs WILL have installed
>>> their handler by the time the transaction ends.
>>>
>>> Or, if there is that guarantee, I don't know what provides it, so I
>>> think we shouldn't count on it accidentally working anymore.
>>>
>>>
>>>
>>> I think we should do two things:
>>>
>>> 1. Move the handler installation to creation time.
>>> 2. Modify backup_before_write_notify to return without invoking
>>> backup_do_cow if the job isn't started yet.
>>>
>>
>> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
>> guest write, is it?
>>
>>
> 
> The idea is that by installing the write notifier during creation, the
> write notifier can be switched on the instant job_start is created,
> regardless of if we yield in the co_entry shim or not.
> 
> That way, no matter when we yield or when the backup_run coroutine
> actually gets scheduled and executed, the write notifier is active already.
> 
> Or put another way: calling job_start() guarantees that the write
> notifier is active.


Oh, got it, feel stupid)

> 
> I think using filters will save us too, but I don't know how ready those
> are. Do we still want a patch that guarantees this behavior in the meantime?
> 

I think we want of course, will look at it tomorrow.


-- 
Best regards,
Vladimir


Re: [Qemu-block] [Qemu-devel] backup bug or question

2019-08-12 Thread John Snow



On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 23:13, John Snow wrote:
>>
>>
>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi!
>>>
>>> Hmm, hacking around backup I have a question:
>>>
>>> What prevents guest write request after job_start but before setting
>>> write notifier?
>>>
>>> code path:
>>>
>>> qmp_drive_backup or transaction with backup
>>>
>>>  job_start
>>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it 
>>> ? */
>>>
>>> 
>>>
>>> job_co_entry
>>>  job_pause_point() /* it definitely yields, isn't it bad? */
>>>  job->driver->run() /* backup_run */
>>>
>>> 
>>>
>>> backup_run()
>>>  bdrv_add_before_write_notifier()
>>>
>>> ...
>>>
>>
>> I think you're right... :(
>>
>>
>> We create jobs like this:
>>
>> job->paused= true;
>> job->pause_count   = 1;
>>
>>
>> And then job_start does this:
>>
>> job->co = qemu_coroutine_create(job_co_entry, job);
>> job->pause_count--;
>> job->busy = true;
>> job->paused = false;
>>
>>
>> Which means that job_co_entry is being called before we lift the pause:
>>
>> assert(job && job->driver && job->driver->run);
>> job_pause_point(job);
>> job->ret = job->driver->run(job, &job->err);
>>
>> ...Which means that we are definitely yielding in job_pause_point.
>>
>> Yeah, that's a race condition waiting to happen.
>>
>>> And what guarantees we give to the user? Is it guaranteed that write 
>>> notifier is
>>> set when qmp command returns?
>>>
>>> And I guess, if we start several backups in a transaction it should be 
>>> guaranteed
>>> that the set of backups is consistent and correspond to one point in time...
>>>
>>
>> I would have hoped that maybe the drain_all coupled with the individual
>> jobs taking drain_start and drain_end would save us, but I guess we
>> simply don't have a guarantee that all backup jobs WILL have installed
>> their handler by the time the transaction ends.
>>
>> Or, if there is that guarantee, I don't know what provides it, so I
>> think we shouldn't count on it accidentally working anymore.
>>
>>
>>
>> I think we should do two things:
>>
>> 1. Move the handler installation to creation time.
>> 2. Modify backup_before_write_notify to return without invoking
>> backup_do_cow if the job isn't started yet.
>>
> 
> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
> guest write, is it?
> 
> 

The idea is that by installing the write notifier during creation, the
write notifier can be switched on the instant job_start is created,
regardless of if we yield in the co_entry shim or not.

That way, no matter when we yield or when the backup_run coroutine
actually gets scheduled and executed, the write notifier is active already.

Or put another way: calling job_start() guarantees that the write
notifier is active.

I think using filters will save us too, but I don't know how ready those
are. Do we still want a patch that guarantees this behavior in the meantime?

--js



Re: [Qemu-block] [Qemu-devel] backup bug or question

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 23:13, John Snow wrote:
> 
> 
> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>>  job_start
>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it 
>> ? */
>>
>> 
>>
>> job_co_entry
>>  job_pause_point() /* it definitely yields, isn't it bad? */
>>  job->driver->run() /* backup_run */
>>
>> 
>>
>> backup_run()
>>  bdrv_add_before_write_notifier()
>>
>> ...
>>
> 
> I think you're right... :(
> 
> 
> We create jobs like this:
> 
> job->paused= true;
> job->pause_count   = 1;
> 
> 
> And then job_start does this:
> 
> job->co = qemu_coroutine_create(job_co_entry, job);
> job->pause_count--;
> job->busy = true;
> job->paused = false;
> 
> 
> Which means that job_co_entry is being called before we lift the pause:
> 
> assert(job && job->driver && job->driver->run);
> job_pause_point(job);
> job->ret = job->driver->run(job, &job->err);
> 
> ...Which means that we are definitely yielding in job_pause_point.
> 
> Yeah, that's a race condition waiting to happen.
> 
>> And what guarantees we give to the user? Is it guaranteed that write 
>> notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be 
>> guaranteed
>> that the set of backups is consistent and correspond to one point in time...
>>
> 
> I would have hoped that maybe the drain_all coupled with the individual
> jobs taking drain_start and drain_end would save us, but I guess we
> simply don't have a guarantee that all backup jobs WILL have installed
> their handler by the time the transaction ends.
> 
> Or, if there is that guarantee, I don't know what provides it, so I
> think we shouldn't count on it accidentally working anymore.
> 
> 
> 
> I think we should do two things:
> 
> 1. Move the handler installation to creation time.
> 2. Modify backup_before_write_notify to return without invoking
> backup_do_cow if the job isn't started yet.
> 

Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
guest write, is it?


-- 
Best regards,
Vladimir


Re: [Qemu-block] [Qemu-devel] backup bug or question

2019-08-09 Thread John Snow



On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> Hmm, hacking around backup I have a question:
> 
> What prevents guest write request after job_start but before setting
> write notifier?
> 
> code path:
> 
> qmp_drive_backup or transaction with backup
> 
> job_start
>aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? 
> */
> 
> 
> 
> job_co_entry
> job_pause_point() /* it definitely yields, isn't it bad? */
> job->driver->run() /* backup_run */
> 
> 
> 
> backup_run()
> bdrv_add_before_write_notifier()
> 
> ...
> 

I think you're right... :(


We create jobs like this:

job->paused= true;
job->pause_count   = 1;


And then job_start does this:

job->co = qemu_coroutine_create(job_co_entry, job);
job->pause_count--;
job->busy = true;
job->paused = false;


Which means that job_co_entry is being called before we lift the pause:

assert(job && job->driver && job->driver->run);
job_pause_point(job);
job->ret = job->driver->run(job, &job->err);

...Which means that we are definitely yielding in job_pause_point.

Yeah, that's a race condition waiting to happen.

> And what guarantees we give to the user? Is it guaranteed that write notifier 
> is
> set when qmp command returns?
> 
> And I guess, if we start several backups in a transaction it should be 
> guaranteed
> that the set of backups is consistent and correspond to one point in time...
> 

I would have hoped that maybe the drain_all coupled with the individual
jobs taking drain_start and drain_end would save us, but I guess we
simply don't have a guarantee that all backup jobs WILL have installed
their handler by the time the transaction ends.

Or, if there is that guarantee, I don't know what provides it, so I
think we shouldn't count on it accidentally working anymore.



I think we should do two things:

1. Move the handler installation to creation time.
2. Modify backup_before_write_notify to return without invoking
backup_do_cow if the job isn't started yet.

I'll send a patch in just a moment ...

--js