Re: RFC: Qemu backup interface plans

2021-05-25 Thread Vladimir Sementsov-Ogievskiy

25.05.2021 11:50, Max Reitz wrote:

On 19.05.21 08:11, Vladimir Sementsov-Ogievskiy wrote:

18.05.2021 19:39, Max Reitz wrote:


[...]


On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote:


[...]


Not also, that there is another benefit of such thing: we'll implement this 
callback in qcow2 driver, so that backup will read clusters not in guest 
cluster order, but in host cluster order, to read more sequentially, which 
should bring better performance on rotating disks.


I’m not exactly sure how you envision this to work, but block_status also 
already gives you the host offset in *map.



But block-status doesn't give a possibility to read sequentially. For this, 
user should call block-status several times until the whole disk covered, then 
sort the segments by host offset. I wonder, could it be implemented as some 
iterator, like

read_iter = bdrv_get_sequential_read_iter(source)

while (extents = bdrv_read_next(read_iter)):
   for ext in extents:
 start_writing_task(target, ext.offset, ext.bytes, ext.qiov)

where bdrv_read_next will read guest data in host-cluster-sequence..


How would you implement this, though?


I don't know :) That's why I wrote "I wonder".. Anyway I have enough work with 
all previous steps.


qcow2 doesn’t have a reverse mapping either, so it too would need to read all 
L2 table entries and sort them, wouldn’t it?



Hmm, yes. With current qcow2, it seems to be the only way. And we'll be limited 
by memory, so probably, read several L2 tables, do sort, return sorted extents, 
read next bunch of L2 tables and so on.

And then, I agree, we can just implement with help of current block_status() 
implementation.

Or probably we can implement reverse mapping in qcow2 as extension. But I doubt 
that it worth the complexity.. Still, it depends on how much extra IO will it 
cost (almost nothing, as should work through qcow2 cache) and how much 
performance benefit will it bring (no idea, it should be measured).

--
Best regards,
Vladimir




Re: RFC: Qemu backup interface plans

2021-05-25 Thread Max Reitz

On 19.05.21 08:11, Vladimir Sementsov-Ogievskiy wrote:

18.05.2021 19:39, Max Reitz wrote:


[...]


On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote:


[...]

Not also, that there is another benefit of such thing: we'll 
implement this callback in qcow2 driver, so that backup will read 
clusters not in guest cluster order, but in host cluster order, to 
read more sequentially, which should bring better performance on 
rotating disks.


I’m not exactly sure how you envision this to work, but block_status 
also already gives you the host offset in *map.




But block-status doesn't give a possibility to read sequentially. For 
this, user should call block-status several times until the whole disk 
covered, then sort the segments by host offset. I wonder, could it be 
implemented as some iterator, like


read_iter = bdrv_get_sequential_read_iter(source)

while (extents = bdrv_read_next(read_iter)):
   for ext in extents:
     start_writing_task(target, ext.offset, ext.bytes, ext.qiov)

where bdrv_read_next will read guest data in host-cluster-sequence..


How would you implement this, though?  qcow2 doesn’t have a reverse 
mapping either, so it too would need to read all L2 table entries and 
sort them, wouldn’t it?


Max



Re: RFC: Qemu backup interface plans

2021-05-21 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 15:07, Vladimir Sementsov-Ogievskiy wrote:

3.1 and make it possible to modify the bitmap externally, so that consumer of 
fleecing can say to backup-top filter: I've already copied these blocks, don't 
bother with copying them to temp image". This is to solve [2].

Still, how consumer of fleecing will reset shared bitmap after copying blocks? I have the 
following idea: we make a "discard-bitmap-filter" filter driver, that own some 
bitmap and on discard request unset corresponding bits. Also, on read, if read from the 
region with unset bits the EINVAL returned immediately. This way both consumers (backup 
job and NBD client) are able to use this interface:

Backup job can simply call discard on source, we can add an option for this.
External backup tool will send TRIM request after reading some region. This way 
disk space will be freed and no extra copy-before-write operations will be 
done. I also have a side idea that we can implement READ_ONCE flag, so that 
READ and TRIM can be done in one NBD command. But this works only for clients 
that don't want to implement any kind of retrying.



So, finally, how will it look (here I call backup-top with a new name, and "file" child is used 
instead of "backing", as this change I propose in "[PATCH 00/21] block: publish backup-top 
filter"):

Pull backup:

┌┐
│ NBD export │
└┘
   │
   │
┌┐  file   
┌───┐  backing   ┌─┐
│ discard-bitmap filter (bitmap=bm0) │ ──▶ │    temp qcow2 img  
   │ ─▶ │ active disk │
└┘ 
└───┘    └─┘
  ▲ 
   ▲
  │ target  
   │
  │ 
   │
┌┐ 
┌───┐  file    │
│ guest blk  │ ──▶ │ copy-before-write filter 
(bitmap=bm0) │ ─┘
└┘ 
└───┘



Now I think that discard-bitmap is not really good idea:

1. If it is separate of internal BlockCopyState::copy_bitmap, than we'll have 
to consider both bitmaps inside block-copy code, it's not a small complication.

2. Using BlockCopyState::copy_bitmap directly seems possible:

User creates copy_bitmap by hand, and pass to copy-before-write filter as an 
option, block-copy will use this bitmap directly

Same bitmap is passed to discard-bitmap filter, so that it can clear bits on 
discards.

Pros:
 - we don't make block-copy code more complicated

Note then, that BlockCopyState::copy_bitmap is used in block-copy process as 
follows:

 - copy tasks are created to copy dirty areas
 - when task is created, corresponding dirty area is cleared (so that creating 
tasks on same area can't happen)
 - if task failed, corresponding area becomes dirty again (so that it can be 
retried)

So, we have
Cons:
 - if discard comes when task is in-flight, bits are already clean. Then if 
task failed bits become dirty again. That's wrong. Not very bad, and not worth 
doing coplications of [1]. But still, keep it in mind [*]
 - we have to use separate bitmap in discard-filter to fail reads from 
non-dirty areas (because BlockCopyState::copy_bitmap is cleared by block-copy 
process, not only by discards). That is worse.. It just means that 
discard-filter is strange thing and we don't have good understanding of what 
should it do. Good documentation will help of course, but...

...this all leads me to new idea:

Let's go without discard-bitmap filter with the simple logic:

If discard is done on block-copy target, let's inform block-copy about it. So, 
block-copy can:

 - clear corresponding area in its internal bitmap
 - [not really improtant, but it addresses [*]]: cancel in-flight tasks in 
discarded area.

Pros: avoid extra filter

--
Best regards,
Vladimir




Re: RFC: Qemu backup interface plans

2021-05-19 Thread Vladimir Sementsov-Ogievskiy

19.05.2021 14:20, Kevin Wolf wrote:

Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben:

2. Test, that we can start backup job with source = (target of
backup-top filter), so that we have "push backup with fleecing".
Make an option for backup to start without a filter, when we don't
need copy-before-write operations, to not create extra superfluous
filter.


OK, so the backup job is not really a backup job, but just anything
that copies data.


Not quite. For backup without a filter we should protect source from
changing, by unsharing WRITE permission on it.

I'll try to avoid adding an option. The logic should work like in
commit job: if there are current writers on source we create filter.
If there no writers, we just unshare writes and go without a filter.
And for this copy-before-write filter should be able to do
WRITE_UNCHANGED in case of fleecing.


If we ever get to the point where we would make a block-copy job visible
to the user, I wouldn't copy the restrictions from the current jobs, but
keep it really generic to cover all cases.

There is no way for the QMP command starting the job to know what the
user is planning to do with the image in the future. Even if it's
currently read-only, the user may want to add a writer later.

I think this means that we want to always add a filter node, and then
possibly dynamically switch between modes if being read-only provides a
significant advantage for the job.

Kevin



Still, in push-backup-with-fleecing scheme we really don't need the second 
filter, so why to insert extra thing to block graph?

I see your point still, that user may want to add writer later. Still, I'd be 
surprised if such use-cases exist now.

What about the following:

add some source-mode tri-state parameter for backup:

auto: insert filter iff there are existing writers [default]
filtered: insert filter unconditionally
immutable: don't insert filter. will fail if there are existing writers, and 
creating writers during block-job would be impossible

--
Best regards,
Vladimir



Re: RFC: Qemu backup interface plans

2021-05-19 Thread Kevin Wolf
Am 19.05.2021 um 13:49 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 19.05.2021 14:20, Kevin Wolf wrote:
> > Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > 2. Test, that we can start backup job with source = (target of
> > > > > backup-top filter), so that we have "push backup with fleecing".
> > > > > Make an option for backup to start without a filter, when we don't
> > > > > need copy-before-write operations, to not create extra superfluous
> > > > > filter.
> > > > 
> > > > OK, so the backup job is not really a backup job, but just anything
> > > > that copies data.
> > > 
> > > Not quite. For backup without a filter we should protect source from
> > > changing, by unsharing WRITE permission on it.
> > > 
> > > I'll try to avoid adding an option. The logic should work like in
> > > commit job: if there are current writers on source we create filter.
> > > If there no writers, we just unshare writes and go without a filter.
> > > And for this copy-before-write filter should be able to do
> > > WRITE_UNCHANGED in case of fleecing.
> > 
> > If we ever get to the point where we would make a block-copy job visible
> > to the user, I wouldn't copy the restrictions from the current jobs, but
> > keep it really generic to cover all cases.
> > 
> > There is no way for the QMP command starting the job to know what the
> > user is planning to do with the image in the future. Even if it's
> > currently read-only, the user may want to add a writer later.
> > 
> > I think this means that we want to always add a filter node, and then
> > possibly dynamically switch between modes if being read-only provides a
> > significant advantage for the job.
> 
> Still, in push-backup-with-fleecing scheme we really don't need the
> second filter, so why to insert extra thing to block graph?
> 
> I see your point still, that user may want to add writer later. Still,
> I'd be surprised if such use-cases exist now.
> 
> What about the following:
> 
> add some source-mode tri-state parameter for backup:
> 
> auto: insert filter iff there are existing writers [default]
> filtered: insert filter unconditionally
> immutable: don't insert filter. will fail if there are existing
> writers, and creating writers during block-job would be impossible

Yes, that's an option, too.

Kevin



Re: RFC: Qemu backup interface plans

2021-05-19 Thread Kevin Wolf
Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 2. Test, that we can start backup job with source = (target of
> > > backup-top filter), so that we have "push backup with fleecing".
> > > Make an option for backup to start without a filter, when we don't
> > > need copy-before-write operations, to not create extra superfluous
> > > filter.
> > 
> > OK, so the backup job is not really a backup job, but just anything
> > that copies data.
> 
> Not quite. For backup without a filter we should protect source from
> changing, by unsharing WRITE permission on it.
> 
> I'll try to avoid adding an option. The logic should work like in
> commit job: if there are current writers on source we create filter.
> If there no writers, we just unshare writes and go without a filter.
> And for this copy-before-write filter should be able to do
> WRITE_UNCHANGED in case of fleecing.

If we ever get to the point where we would make a block-copy job visible
to the user, I wouldn't copy the restrictions from the current jobs, but
keep it really generic to cover all cases.

There is no way for the QMP command starting the job to know what the
user is planning to do with the image in the future. Even if it's
currently read-only, the user may want to add a writer later.

I think this means that we want to always add a filter node, and then
possibly dynamically switch between modes if being read-only provides a
significant advantage for the job.

Kevin



Re: RFC: Qemu backup interface plans

2021-05-19 Thread Vladimir Sementsov-Ogievskiy

18.05.2021 19:39, Max Reitz wrote:

Hi,

Your proposal sounds good to me in general.  Some small independent building 
blocks that seems to make sense to me.


Thanks! I hope it's not too difficult to read and understand my English.




On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote:

[...]


What we lack in this scheme:

1. handling dirty bitmap in backup-top filter: backup-top does copy-before-write 
operation on any guest write, when actually we are interested only in "dirty" 
regions for incremental backup

Probable solution would allowing specifying bitmap for sync=none mode of 
backup, but I think what I propose below is better.

2. [actually it's a tricky part of 1]: possibility to not do copy-before-write 
operations for regions that was already copied to final backup. With normal 
Qemu backup job, this is achieved by the fact that block-copy state with its 
internal bitmap is shared between backup job and copy-before-write filter.

3. Not a real problem but fact: backup block-job does nothing in the scheme, 
the whole job is done by filter. So, it would be interesting to have a 
possibility to simply insert/remove the filter, and avoid block-job creation 
and managing at all for external backup. (and I'd like to send another RFC on 
how to insert/remove filters, let's not discuss it here).


Next. Think about internal backup. It has one drawback too:
4. If target is remote with slow connection, copy-before-write operations will 
slow down guest writes appreciably.

It may be solved with help of image fleecing: we create temporary qcow2 image, 
setup fleecing scheme, and instead of exporting temp image through NBD we start 
a second backup with source = temporary image and target would be real backup 
target (NBD for example).


How would a second backup work here?  Wouldn’t one want a mirror job to copy 
the data off to the real target?

(Because I see backup as something intrinsically synchronous, whereas mirror by 
default is rather lazy.)

[Note from future me where I read more below: I see you acknowledge that you’ll 
need to modify backup to do what you need here, i.e. not do any CBW operations. 
 So it’s effectively the same as a mirror that ignores new dirty areas.  Which 
could work without changing mirror if block-copy were to set 
BDRV_REQ_WRITE_UNCHANGED for the fleecing case, and bdrv_co_write_req_finish() 
would skip bdrv_set_dirty() for such writes.]


I just feel myself closer with backup block-job than with mirror :) Finally, 
yes, there is no real difference in interface. But in realization, I prefer to 
continue developing block-copy. I hope, finally all jobs and img-convert would 
work through block-copy.

(and I'll need BDRV_REQ_WRITE_UNCHANGED anyway for fleecing, so user can use 
mirror or backup)



I mean, still has the problem that the mirror job can’t tell the CBW filter 
which areas are already copied off and so don’t need to be preserved anymore, 
but...


Still, with such solution there are same [1,2] problems, 3 becomes worse:


Not sure how 3 can become worse when you said above it isn’t a real problem (to 
which I agree).


It's my perfectionism :) Yes, it's still isn't a problem, but number of extra 
user-visible objects in architecture increases, which is not good I think.




5. We'll have two jobs and two automatically inserted filters, when actually 
one filter and one job are enough (as first job is needed only to insert a 
filter, second job doesn't need a filter at all).

Note also, that this (starting two backup jobs to make push backup with 
fleecing) doesn't work now, op-blockers will be against. It's simple to fix 
(and in Virtuozzo we live with downstream-only patch, which allows push backup 
with fleecing, based on starting two backup jobs).. But I never send a patch, 
as I have better plan, which will solve all listed problems.


So, what I propose:

1. We make backup-top filter public, so that it could be inserted/removed where 
user wants through QMP (how to properly insert/remove filter I'll post another 
RFC, as backup-top is not the only filter that can be usefully inserted 
somewhere). For this first step I've sent a series today:

   subject: [PATCH 00/21] block: publish backup-top filter
   id: <20210517064428.16223-1-vsement...@virtuozzo.com>
   patchew: 
https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/

(note, that one of things in this series is rename 
s/backup-top/copy-before-write/, still, I call it backup-top in this letter)

This solves [3]. [4, 5] are solved partly: we still have one extra filter, 
created by backup block jobs, and also I didn't test does this work, probably 
some op-blockers or permissions should be tuned. So, let it be step 2:

2. Test, that we can start backup job with source = (target of backup-top filter), so 
that we have "push backup with fleecing". Make an option for backup to start 
without a filter, when we don't need copy-before-write operations, to not create extra 

Re: RFC: Qemu backup interface plans

2021-05-18 Thread Max Reitz

Hi,

Your proposal sounds good to me in general.  Some small independent 
building blocks that seems to make sense to me.



On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote:

[...]


What we lack in this scheme:

1. handling dirty bitmap in backup-top filter: backup-top does 
copy-before-write operation on any guest write, when actually we are 
interested only in "dirty" regions for incremental backup


Probable solution would allowing specifying bitmap for sync=none mode of 
backup, but I think what I propose below is better.


2. [actually it's a tricky part of 1]: possibility to not do 
copy-before-write operations for regions that was already copied to 
final backup. With normal Qemu backup job, this is achieved by the fact 
that block-copy state with its internal bitmap is shared between backup 
job and copy-before-write filter.


3. Not a real problem but fact: backup block-job does nothing in the 
scheme, the whole job is done by filter. So, it would be interesting to 
have a possibility to simply insert/remove the filter, and avoid 
block-job creation and managing at all for external backup. (and I'd 
like to send another RFC on how to insert/remove filters, let's not 
discuss it here).



Next. Think about internal backup. It has one drawback too:
4. If target is remote with slow connection, copy-before-write 
operations will slow down guest writes appreciably.


It may be solved with help of image fleecing: we create temporary qcow2 
image, setup fleecing scheme, and instead of exporting temp image 
through NBD we start a second backup with source = temporary image and 
target would be real backup target (NBD for example).


How would a second backup work here?  Wouldn’t one want a mirror job to 
copy the data off to the real target?


(Because I see backup as something intrinsically synchronous, whereas 
mirror by default is rather lazy.)


[Note from future me where I read more below: I see you acknowledge that 
you’ll need to modify backup to do what you need here, i.e. not do any 
CBW operations.  So it’s effectively the same as a mirror that ignores 
new dirty areas.  Which could work without changing mirror if block-copy 
were to set BDRV_REQ_WRITE_UNCHANGED for the fleecing case, and 
bdrv_co_write_req_finish() would skip bdrv_set_dirty() for such writes.]


I mean, still has the problem that the mirror job can’t tell the CBW 
filter which areas are already copied off and so don’t need to be 
preserved anymore, but...


Still, with such 
solution there are same [1,2] problems, 3 becomes worse:


Not sure how 3 can become worse when you said above it isn’t a real 
problem (to which I agree).


5. We'll have two jobs and two automatically inserted filters, when 
actually one filter and one job are enough (as first job is needed only 
to insert a filter, second job doesn't need a filter at all).


Note also, that this (starting two backup jobs to make push backup with 
fleecing) doesn't work now, op-blockers will be against. It's simple to 
fix (and in Virtuozzo we live with downstream-only patch, which allows 
push backup with fleecing, based on starting two backup jobs).. But I 
never send a patch, as I have better plan, which will solve all listed 
problems.



So, what I propose:

1. We make backup-top filter public, so that it could be 
inserted/removed where user wants through QMP (how to properly 
insert/remove filter I'll post another RFC, as backup-top is not the 
only filter that can be usefully inserted somewhere). For this first 
step I've sent a series today:


   subject: [PATCH 00/21] block: publish backup-top filter
   id: <20210517064428.16223-1-vsement...@virtuozzo.com>
   patchew: 
https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/


(note, that one of things in this series is rename 
s/backup-top/copy-before-write/, still, I call it backup-top in this 
letter)


This solves [3]. [4, 5] are solved partly: we still have one extra 
filter, created by backup block jobs, and also I didn't test does this 
work, probably some op-blockers or permissions should be tuned. So, let 
it be step 2:


2. Test, that we can start backup job with source = (target of 
backup-top filter), so that we have "push backup with fleecing". Make an 
option for backup to start without a filter, when we don't need 
copy-before-write operations, to not create extra superfluous filter.


OK, so the backup job is not really a backup job, but just anything that 
copies data.



3. Support bitmap in backup-top filter, to solve [1]

3.1 and make it possible to modify the bitmap externally, so that 
consumer of fleecing can say to backup-top filter: I've already copied 
these blocks, don't bother with copying them to temp image". This is to 
solve [2].


Still, how consumer of fleecing will reset shared bitmap after copying 
blocks? I have the following idea: we make a "discard-bitmap-filter" 
filter driver, that own some bitmap and on discard request unset 
corresponding bits.