Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Max Reitz
On 09.08.19 14:47, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 15:25, Max Reitz wrote:
>> On 09.08.19 09:50, Vladimir Sementsov-Ogievskiy wrote:
>>> 07.08.2019 21:01, Max Reitz wrote:
 On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
> Limit block_status querying to request bounds on write notifier to
> avoid extra seeking.

 I don’t understand this reasoning.  Checking whether something is
 allocated for qcow2 should just mean an L2 cache lookup.  Which we have
 to do anyway when we try to copy data off the source.
>>>
>>> But for raw it's seeking.
>>
>> (1) That’s a bug in block_status then, isn’t it?
>>
>> file-posix cannot determine the allocation status, or rather, everything
>> is allocated.  bdrv_co_block_status() should probably pass @want_zero on
>> to the driver’s implementation, and file-posix should just
>> unconditionally return DATA if it’s false.
>>
>> (2) Why would you even use sync=top for raw nodes?
>>
> 
> As I described in parallel letters, raw was bad example. NBD is good.

Does NBD support backing files?

> Anyway, now I'm refactoring cluster skipping more deeply for v2.
> 
> About top-mode: finally block-status should be used to improve other
> modes too. In virtuozzo we skip unallocated for full mode too, for example.

But this patch here is about sync=top.

Skipping is an optimization, the block_status querying here happens
because copying anything that isn’t allocated in the top layer would be
wrong.

Max

> Unfortunately, backup is most long-term thing to upstream for me..



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 15:25, Max Reitz wrote:
> On 09.08.19 09:50, Vladimir Sementsov-Ogievskiy wrote:
>> 07.08.2019 21:01, Max Reitz wrote:
>>> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
 Limit block_status querying to request bounds on write notifier to
 avoid extra seeking.
>>>
>>> I don’t understand this reasoning.  Checking whether something is
>>> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
>>> to do anyway when we try to copy data off the source.
>>
>> But for raw it's seeking.
> 
> (1) That’s a bug in block_status then, isn’t it?
> 
> file-posix cannot determine the allocation status, or rather, everything
> is allocated.  bdrv_co_block_status() should probably pass @want_zero on
> to the driver’s implementation, and file-posix should just
> unconditionally return DATA if it’s false.
> 
> (2) Why would you even use sync=top for raw nodes?
> 

As I described in parallel letters, raw was bad example. NBD is good.
Anyway, now I'm refactoring cluster skipping more deeply for v2.

About top-mode: finally block-status should be used to improve other
modes too. In virtuozzo we skip unallocated for full mode too, for example.
Unfortunately, backup is most long-term thing to upstream for me..

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Max Reitz
On 09.08.19 09:50, Vladimir Sementsov-Ogievskiy wrote:
> 07.08.2019 21:01, Max Reitz wrote:
>> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
>>> Limit block_status querying to request bounds on write notifier to
>>> avoid extra seeking.
>>
>> I don’t understand this reasoning.  Checking whether something is
>> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
>> to do anyway when we try to copy data off the source.
> 
> But for raw it's seeking.

(1) That’s a bug in block_status then, isn’t it?

file-posix cannot determine the allocation status, or rather, everything
is allocated.  bdrv_co_block_status() should probably pass @want_zero on
to the driver’s implementation, and file-posix should just
unconditionally return DATA if it’s false.

(2) Why would you even use sync=top for raw nodes?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 12:12, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 10:50, Vladimir Sementsov-Ogievskiy wrote:
>> 07.08.2019 21:01, Max Reitz wrote:
>>> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
 Limit block_status querying to request bounds on write notifier to
 avoid extra seeking.
>>>
>>> I don’t understand this reasoning.  Checking whether something is
>>> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
>>> to do anyway when we try to copy data off the source.
>>
>> But for raw it's seeking. I think we just shouldn't do any unnecessary 
>> operations
>> in copy-before-write handling. So instead of calling
>> block_status(request_start, disk_end) I think it's better to call
>> block_status(request_start, request_end). And it is more applicable for 
>> reusing this
>> code too.
>>
>>>
>>> I could understand saying this makes the code easier, but I actually
>>> don’t think it does.  If you implemented checking the allocation status
>>> only for areas where the bitmap is reset (which I think this patch
>>> should), then it’d just duplicate the existing loop.
>>>
 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
   block/backup.c | 38 +-
   1 file changed, 21 insertions(+), 17 deletions(-)

 diff --git a/block/backup.c b/block/backup.c
 index 11e27c844d..a4d37d2d62 100644
 --- a/block/backup.c
 +++ b/block/backup.c
>>>
>>> [...]
>>>
 @@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
 *job,
   wait_for_overlapping_requests(job, start, end);
   cow_request_begin(&cow_request, job, start, end);
 +    if (job->initializing_bitmap) {
 +    int64_t off, chunk;
 +
 +    for (off = offset; offset < end; offset += chunk) {
>>>
>>> This is what I’m referring to, I think this loop should skip areas that
>>> are clean.
>>
>> Agree, will do.
> 
> Hmm, I remembered that I thought of it, but decided that it may be not 
> efficient if
> bitmap fragmentation is higher than block-status fragmentation. So, if we 
> don't know
> what is better, let's keep simpler code.

Hmm2, but I see a compromise (may be you meant exactly this): not using next 
zero as limit
(but always use request_end), but before each iteration skip to next_dirty.

> 
>>
>>>
 +    ret = backup_bitmap_reset_unallocated(job, off, end - off, 
 &chunk);
 +    if (ret < 0) {
 +    chunk = job->cluster_size;
 +    }
 +    }
 +    }
 +    ret = 0;
 +
   while (start < end) {
   int64_t dirty_end;
 @@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
 *job,
   continue; /* already copied */
   }
 -    if (job->initializing_bitmap) {
 -    ret = backup_bitmap_reset_unallocated(job, start, 
 &skip_bytes);
 -    if (ret == 0) {
 -    trace_backup_do_cow_skip_range(job, start, skip_bytes);
 -    start += skip_bytes;
 -    continue;
 -    }
 -    }
 -
   dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
   end - start);
   if (dirty_end < 0) {
>>>
>>> Hm, you resolved that conflict differently from me.
>>>
>>> I decided the bdrv_dirty_bitmap_next_zero() call should go before the
>>> backup_bitmap_reset_unallocated() call so that we can then do a
>>>
>>>    dirty_end = MIN(dirty_end, start + skip_bytes);
>>>
>>> because we probably don’t want to copy anything past what
>>> backup_bitmap_reset_unallocated() has inquired.
>>>
>>>
>>> But then again this patch eliminates the difference anyway...
>>>  >
 @@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error 
 **errp)
   goto out;
   }
 -    ret = backup_bitmap_reset_unallocated(s, offset, &count);
 +    ret = backup_bitmap_reset_unallocated(s, offset, s->len - 
 offset,
 +  &count);
   if (ret < 0) {
   goto out;
   }

>>>
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 10:50, Vladimir Sementsov-Ogievskiy wrote:
> 07.08.2019 21:01, Max Reitz wrote:
>> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
>>> Limit block_status querying to request bounds on write notifier to
>>> avoid extra seeking.
>>
>> I don’t understand this reasoning.  Checking whether something is
>> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
>> to do anyway when we try to copy data off the source.
> 
> But for raw it's seeking. I think we just shouldn't do any unnecessary 
> operations
> in copy-before-write handling. So instead of calling
> block_status(request_start, disk_end) I think it's better to call
> block_status(request_start, request_end). And it is more applicable for 
> reusing this
> code too.

oops, and seek lack the limit anyway.

But anyway, we have API with count limit and it seems natural to assume that 
some
driver may do more calculations for bigger request. So smaller request is good 
for
copy-before-write operation when we are in a harry to unfreeze guest write as 
soon
as possible.

Hmm, example of such drive may be NBD, which can concatenate block-status 
results of
exported disk.


> 
>>
>> I could understand saying this makes the code easier, but I actually
>> don’t think it does.  If you implemented checking the allocation status
>> only for areas where the bitmap is reset (which I think this patch
>> should), then it’d just duplicate the existing loop.
>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup.c | 38 +-
>>>   1 file changed, 21 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 11e27c844d..a4d37d2d62 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>
>> [...]
>>
>>> @@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>>> *job,
>>>   wait_for_overlapping_requests(job, start, end);
>>>   cow_request_begin(&cow_request, job, start, end);
>>> +    if (job->initializing_bitmap) {
>>> +    int64_t off, chunk;
>>> +
>>> +    for (off = offset; offset < end; offset += chunk) {
>>
>> This is what I’m referring to, I think this loop should skip areas that
>> are clean.
> 
> Agree, will do.
> 
>>
>>> +    ret = backup_bitmap_reset_unallocated(job, off, end - off, 
>>> &chunk);
>>> +    if (ret < 0) {
>>> +    chunk = job->cluster_size;
>>> +    }
>>> +    }
>>> +    }
>>> +    ret = 0;
>>> +
>>>   while (start < end) {
>>>   int64_t dirty_end;
>>> @@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>>> *job,
>>>   continue; /* already copied */
>>>   }
>>> -    if (job->initializing_bitmap) {
>>> -    ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
>>> -    if (ret == 0) {
>>> -    trace_backup_do_cow_skip_range(job, start, skip_bytes);
>>> -    start += skip_bytes;
>>> -    continue;
>>> -    }
>>> -    }
>>> -
>>>   dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
>>>   end - start);
>>>   if (dirty_end < 0) {
>>
>> Hm, you resolved that conflict differently from me.
>>
>> I decided the bdrv_dirty_bitmap_next_zero() call should go before the
>> backup_bitmap_reset_unallocated() call so that we can then do a
>>
>>    dirty_end = MIN(dirty_end, start + skip_bytes);
>>
>> because we probably don’t want to copy anything past what
>> backup_bitmap_reset_unallocated() has inquired.
>>
>>
>> But then again this patch eliminates the difference anyway...
>>  >
>>> @@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error 
>>> **errp)
>>>   goto out;
>>>   }
>>> -    ret = backup_bitmap_reset_unallocated(s, offset, &count);
>>> +    ret = backup_bitmap_reset_unallocated(s, offset, s->len - 
>>> offset,
>>> +  &count);
>>>   if (ret < 0) {
>>>   goto out;
>>>   }
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 10:50, Vladimir Sementsov-Ogievskiy wrote:
> 07.08.2019 21:01, Max Reitz wrote:
>> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
>>> Limit block_status querying to request bounds on write notifier to
>>> avoid extra seeking.
>>
>> I don’t understand this reasoning.  Checking whether something is
>> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
>> to do anyway when we try to copy data off the source.
> 
> But for raw it's seeking. I think we just shouldn't do any unnecessary 
> operations
> in copy-before-write handling. So instead of calling
> block_status(request_start, disk_end) I think it's better to call
> block_status(request_start, request_end). And it is more applicable for 
> reusing this
> code too.
> 
>>
>> I could understand saying this makes the code easier, but I actually
>> don’t think it does.  If you implemented checking the allocation status
>> only for areas where the bitmap is reset (which I think this patch
>> should), then it’d just duplicate the existing loop.
>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup.c | 38 +-
>>>   1 file changed, 21 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 11e27c844d..a4d37d2d62 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>
>> [...]
>>
>>> @@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>>> *job,
>>>   wait_for_overlapping_requests(job, start, end);
>>>   cow_request_begin(&cow_request, job, start, end);
>>> +    if (job->initializing_bitmap) {
>>> +    int64_t off, chunk;
>>> +
>>> +    for (off = offset; offset < end; offset += chunk) {
>>
>> This is what I’m referring to, I think this loop should skip areas that
>> are clean.
> 
> Agree, will do.

Hmm, I remembered that I thought of it, but decided that it may be not 
efficient if
bitmap fragmentation is higher than block-status fragmentation. So, if we don't 
know
what is better, let's keep simpler code.

> 
>>
>>> +    ret = backup_bitmap_reset_unallocated(job, off, end - off, 
>>> &chunk);
>>> +    if (ret < 0) {
>>> +    chunk = job->cluster_size;
>>> +    }
>>> +    }
>>> +    }
>>> +    ret = 0;
>>> +
>>>   while (start < end) {
>>>   int64_t dirty_end;
>>> @@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>>> *job,
>>>   continue; /* already copied */
>>>   }
>>> -    if (job->initializing_bitmap) {
>>> -    ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
>>> -    if (ret == 0) {
>>> -    trace_backup_do_cow_skip_range(job, start, skip_bytes);
>>> -    start += skip_bytes;
>>> -    continue;
>>> -    }
>>> -    }
>>> -
>>>   dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
>>>   end - start);
>>>   if (dirty_end < 0) {
>>
>> Hm, you resolved that conflict differently from me.
>>
>> I decided the bdrv_dirty_bitmap_next_zero() call should go before the
>> backup_bitmap_reset_unallocated() call so that we can then do a
>>
>>    dirty_end = MIN(dirty_end, start + skip_bytes);
>>
>> because we probably don’t want to copy anything past what
>> backup_bitmap_reset_unallocated() has inquired.
>>
>>
>> But then again this patch eliminates the difference anyway...
>>  >
>>> @@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error 
>>> **errp)
>>>   goto out;
>>>   }
>>> -    ret = backup_bitmap_reset_unallocated(s, offset, &count);
>>> +    ret = backup_bitmap_reset_unallocated(s, offset, s->len - 
>>> offset,
>>> +  &count);
>>>   if (ret < 0) {
>>>   goto out;
>>>   }
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-09 Thread Vladimir Sementsov-Ogievskiy
07.08.2019 21:01, Max Reitz wrote:
> On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
>> Limit block_status querying to request bounds on write notifier to
>> avoid extra seeking.
> 
> I don’t understand this reasoning.  Checking whether something is
> allocated for qcow2 should just mean an L2 cache lookup.  Which we have
> to do anyway when we try to copy data off the source.

But for raw it's seeking. I think we just shouldn't do any unnecessary 
operations
in copy-before-write handling. So instead of calling
block_status(request_start, disk_end) I think it's better to call
block_status(request_start, request_end). And it is more applicable for reusing 
this
code too.

> 
> I could understand saying this makes the code easier, but I actually
> don’t think it does.  If you implemented checking the allocation status
> only for areas where the bitmap is reset (which I think this patch
> should), then it’d just duplicate the existing loop.
> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup.c | 38 +-
>>   1 file changed, 21 insertions(+), 17 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 11e27c844d..a4d37d2d62 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
> 
> [...]
> 
>> @@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>> *job,
>>   wait_for_overlapping_requests(job, start, end);
>>   cow_request_begin(&cow_request, job, start, end);
>>   
>> +if (job->initializing_bitmap) {
>> +int64_t off, chunk;
>> +
>> +for (off = offset; offset < end; offset += chunk) {
> 
> This is what I’m referring to, I think this loop should skip areas that
> are clean.

Agree, will do.

> 
>> +ret = backup_bitmap_reset_unallocated(job, off, end - off, 
>> &chunk);
>> +if (ret < 0) {
>> +chunk = job->cluster_size;
>> +}
>> +}
>> +}
>> +ret = 0;
>> +
>>   while (start < end) {
>>   int64_t dirty_end;
>>   
>> @@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>> *job,
>>   continue; /* already copied */
>>   }
>>   
>> -if (job->initializing_bitmap) {
>> -ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
>> -if (ret == 0) {
>> -trace_backup_do_cow_skip_range(job, start, skip_bytes);
>> -start += skip_bytes;
>> -continue;
>> -}
>> -}
>> -
>>   dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
>>   end - start);
>>   if (dirty_end < 0) {
> 
> Hm, you resolved that conflict differently from me.
> 
> I decided the bdrv_dirty_bitmap_next_zero() call should go before the
> backup_bitmap_reset_unallocated() call so that we can then do a
> 
>dirty_end = MIN(dirty_end, start + skip_bytes);
> 
> because we probably don’t want to copy anything past what
> backup_bitmap_reset_unallocated() has inquired.
> 
> 
> But then again this patch eliminates the difference anyway...
>  >
>> @@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error 
>> **errp)
>>   goto out;
>>   }
>>   
>> -ret = backup_bitmap_reset_unallocated(s, offset, &count);
>> +ret = backup_bitmap_reset_unallocated(s, offset, s->len - 
>> offset,
>> +  &count);
>>   if (ret < 0) {
>>   goto out;
>>   }
>>
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-07 Thread Max Reitz
On 07.08.19 10:07, Vladimir Sementsov-Ogievskiy wrote:
> Limit block_status querying to request bounds on write notifier to
> avoid extra seeking.

I don’t understand this reasoning.  Checking whether something is
allocated for qcow2 should just mean an L2 cache lookup.  Which we have
to do anyway when we try to copy data off the source.

I could understand saying this makes the code easier, but I actually
don’t think it does.  If you implemented checking the allocation status
only for areas where the bitmap is reset (which I think this patch
should), then it’d just duplicate the existing loop.

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/backup.c | 38 +-
>  1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 11e27c844d..a4d37d2d62 100644
> --- a/block/backup.c
> +++ b/block/backup.c

[...]

> @@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  wait_for_overlapping_requests(job, start, end);
>  cow_request_begin(&cow_request, job, start, end);
>  
> +if (job->initializing_bitmap) {
> +int64_t off, chunk;
> +
> +for (off = offset; offset < end; offset += chunk) {

This is what I’m referring to, I think this loop should skip areas that
are clean.

> +ret = backup_bitmap_reset_unallocated(job, off, end - off, 
> &chunk);
> +if (ret < 0) {
> +chunk = job->cluster_size;
> +}
> +}
> +}
> +ret = 0;
> +
>  while (start < end) {
>  int64_t dirty_end;
>  
> @@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  continue; /* already copied */
>  }
>  
> -if (job->initializing_bitmap) {
> -ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
> -if (ret == 0) {
> -trace_backup_do_cow_skip_range(job, start, skip_bytes);
> -start += skip_bytes;
> -continue;
> -}
> -}
> -
>  dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
>  end - start);
>  if (dirty_end < 0) {

Hm, you resolved that conflict differently from me.

I decided the bdrv_dirty_bitmap_next_zero() call should go before the
backup_bitmap_reset_unallocated() call so that we can then do a

  dirty_end = MIN(dirty_end, start + skip_bytes);

because we probably don’t want to copy anything past what
backup_bitmap_reset_unallocated() has inquired.


But then again this patch eliminates the difference anyway...

Max

> @@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>  goto out;
>  }
>  
> -ret = backup_bitmap_reset_unallocated(s, offset, &count);
> +ret = backup_bitmap_reset_unallocated(s, offset, s->len - offset,
> +  &count);
>  if (ret < 0) {
>  goto out;
>  }
> 




signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 4/8] block/backup: improve unallocated clusters skipping

2019-08-07 Thread Vladimir Sementsov-Ogievskiy
Limit block_status querying to request bounds on write notifier to
avoid extra seeking.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 11e27c844d..a4d37d2d62 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -180,14 +180,14 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
  * return via pnum the number of contiguous clusters sharing this allocation.
  */
 static int backup_is_cluster_allocated(BackupBlockJob *s, int64_t offset,
-   int64_t *pnum)
+   int64_t bytes, int64_t *pnum)
 {
 BlockDriverState *bs = blk_bs(s->common.blk);
 int64_t count, total_count = 0;
-int64_t bytes = s->len - offset;
 int ret;
 
 assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+bytes = MIN(s->len - offset, bytes);
 
 while (true) {
 ret = bdrv_is_allocated(bs, offset, bytes, &count);
@@ -224,12 +224,13 @@ static int backup_is_cluster_allocated(BackupBlockJob *s, 
int64_t offset,
  * 1 otherwise, and -ret on error.
  */
 static int64_t backup_bitmap_reset_unallocated(BackupBlockJob *s,
-   int64_t offset, int64_t *count)
+   int64_t offset, int64_t bytes,
+   int64_t *pnum)
 {
 int ret;
-int64_t clusters, bytes, estimate;
+int64_t clusters, estimate;
 
-ret = backup_is_cluster_allocated(s, offset, &clusters);
+ret = backup_is_cluster_allocated(s, offset, bytes, &clusters);
 if (ret < 0) {
 return ret;
 }
@@ -242,7 +243,7 @@ static int64_t 
backup_bitmap_reset_unallocated(BackupBlockJob *s,
 job_progress_set_remaining(&s->common.job, estimate);
 }
 
-*count = bytes;
+*pnum = bytes;
 return ret;
 }
 
@@ -255,7 +256,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 int ret = 0;
 int64_t start, end; /* bytes */
 void *bounce_buffer = NULL;
-int64_t skip_bytes;
 
 qemu_co_rwlock_rdlock(&job->flush_rwlock);
 
@@ -267,6 +267,18 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 wait_for_overlapping_requests(job, start, end);
 cow_request_begin(&cow_request, job, start, end);
 
+if (job->initializing_bitmap) {
+int64_t off, chunk;
+
+for (off = offset; offset < end; offset += chunk) {
+ret = backup_bitmap_reset_unallocated(job, off, end - off, &chunk);
+if (ret < 0) {
+chunk = job->cluster_size;
+}
+}
+}
+ret = 0;
+
 while (start < end) {
 int64_t dirty_end;
 
@@ -276,15 +288,6 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 continue; /* already copied */
 }
 
-if (job->initializing_bitmap) {
-ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
-if (ret == 0) {
-trace_backup_do_cow_skip_range(job, start, skip_bytes);
-start += skip_bytes;
-continue;
-}
-}
-
 dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
 end - start);
 if (dirty_end < 0) {
@@ -546,7 +549,8 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 goto out;
 }
 
-ret = backup_bitmap_reset_unallocated(s, offset, &count);
+ret = backup_bitmap_reset_unallocated(s, offset, s->len - offset,
+  &count);
 if (ret < 0) {
 goto out;
 }
-- 
2.18.0