Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-11 Thread Fam Zheng
On Wed, 05/06 12:21, Paolo Bonzini wrote:
 
 
 On 06/05/2015 11:50, Fam Zheng wrote:
   #   src can_write_zeroes_with_unmap   target 
  can_write_zeroes_with_unmap
  
   1  true   true
   2  true   false
   3  false  true
   4  false  false
 
 1 should replicate WRITE SAME, in case the unmap granularity of the
 target is different from that of the source.  In that case, a discard on
 the target might leave some sectors untouched.  Writing zeroes would
 ensure matching data between the source and the target.
 
 2 should _not_ discard: it should write zeroes even at the cost of
 making the target fully provisioned.  Perhaps you can optimize it by
 looking at bdrv_get_block_status for the target, and checking the answer
 for BDRV_ZERO.
 
 3 and 4 can use discard on the target.
 
 So it looks like only the source setting matters.
 
 We need to check the cost of bdrv_co_get_block_status for raw, too.
 If it's too expensive, that can be a problem.

In my test, bdrv_is_allocated_above() takes 10~20us on ext4 raw image on my
laptop SATA.  And also note that:

/*
 * ...
 *
 * 'pnum' is set to the number of sectors (including and immediately 
following
 * the specified sector) that are known to be in the same
 * allocated/unallocated state.
 *
 * 'nb_sectors' is the max value 'pnum' should be set to.  If 
nb_sectors goes
 * beyond the end of the disk image it will be clamped.
 */
static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState 
*bs,

So we currently don't coalesce the sequential dirty sectors.

Was that intentional?

Fam

 
 Paolo
 
  For case 2  3 it's probably better to mirror the actual reading of source.
  
  I'm not sure about 4.
 
 Even in case 1, discard could be UNMAP and write zeroes could be
 WRITE SAME.  If the unmap granularity of the target is For unaligned
 sectors, UNMAP might leave some sectors aside while WRITE SAME will
 write with zeroes.



Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-11 Thread Paolo Bonzini


On 11/05/2015 10:02, Fam Zheng wrote:
 
   /*
* ...
*
* 'pnum' is set to the number of sectors (including and immediately 
 following
* the specified sector) that are known to be in the same
* allocated/unallocated state.
*
* 'nb_sectors' is the max value 'pnum' should be set to.  If 
 nb_sectors goes
* beyond the end of the disk image it will be clamped.
*/
   static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState 
 *bs,
 
 So we currently don't coalesce the sequential dirty sectors.
 
 Was that intentional?

The idea, I think, is that for some drivers bdrv_co_get_block_status is
O(nb_sectors).

It's okay to let a driver return *pnum  nb_sectors.  You do need to
audit the callers, though.  It would be nice also to add TODOs whenever
the code is fine but it could explot *pnum  nb_sectors to get better
performance.

Paolo



Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-06 Thread Paolo Bonzini


On 06/05/2015 03:45, Fam Zheng wrote:
  This is not enough, you also have to do the discard in block/mirror.c,
  otherwise the destination image could even become fully provisioned!
 I wasn't sure what if src and dest have different can_write_zeroes_with_unmap
 value, but your argument is stronger.
 
 I will add discard in mirror.  Besides that, do we need to compare the
 can_write_zeroes_with_unmap?

Hmm, if can_write_zeroes_with_unmap is set, it's probably better to
write zeroes instead of discarding, in case the guest is relying on it.

Paolo



Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-06 Thread Paolo Bonzini


On 06/05/2015 11:50, Fam Zheng wrote:
  #   src can_write_zeroes_with_unmap   target can_write_zeroes_with_unmap
 
  1  true   true
  2  true   false
  3  false  true
  4  false  false

1 should replicate WRITE SAME, in case the unmap granularity of the
target is different from that of the source.  In that case, a discard on
the target might leave some sectors untouched.  Writing zeroes would
ensure matching data between the source and the target.

2 should _not_ discard: it should write zeroes even at the cost of
making the target fully provisioned.  Perhaps you can optimize it by
looking at bdrv_get_block_status for the target, and checking the answer
for BDRV_ZERO.

3 and 4 can use discard on the target.

So it looks like only the source setting matters.

We need to check the cost of bdrv_co_get_block_status for raw, too.
If it's too expensive, that can be a problem.

Paolo

 For case 2  3 it's probably better to mirror the actual reading of source.
 
 I'm not sure about 4.

Even in case 1, discard could be UNMAP and write zeroes could be
WRITE SAME.  If the unmap granularity of the target is For unaligned
sectors, UNMAP might leave some sectors aside while WRITE SAME will
write with zeroes.



Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-06 Thread Fam Zheng
On Wed, 05/06 10:59, Paolo Bonzini wrote:
 
 
 On 06/05/2015 03:45, Fam Zheng wrote:
   This is not enough, you also have to do the discard in block/mirror.c,
   otherwise the destination image could even become fully provisioned!
  I wasn't sure what if src and dest have different 
  can_write_zeroes_with_unmap
  value, but your argument is stronger.
  
  I will add discard in mirror.  Besides that, do we need to compare the
  can_write_zeroes_with_unmap?
 
 Hmm, if can_write_zeroes_with_unmap is set, it's probably better to
 write zeroes instead of discarding, in case the guest is relying on it.
 

I think there are four cases:

 #   src can_write_zeroes_with_unmap   target can_write_zeroes_with_unmap

 1  true   true
 2  true   false
 3  false  true
 4  false  false

I think replicating discard only works for 1, because both side would then read
0.  For case 2  3 it's probably better to mirror the actual reading of source.

I'm not sure about 4.

What do you think?

Fam



[Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-05 Thread Fam Zheng
Unsetting dirty globally with discard is not very correct. The discard may zero
out sectors (depending on can_write_zeroes_with_unmap), we should replicate
this change to destinition side to make sure that the guest sees the same data.

Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
doesn't expect unsetting of bits after current position.

So let's do it the opposite way which fixes both problems: set the dirty bits
if we are to discard it.

Reported-by: wangxiaol...@ucloud.cn
Signed-off-by: Fam Zheng f...@redhat.com
---
 block/io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1ce62c4..809688b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return -EROFS;
 }
 
-bdrv_reset_dirty(bs, sector_num, nb_sectors);
-
 /* Do nothing if disabled.  */
 if (!(bs-open_flags  BDRV_O_UNMAP)) {
 return 0;
@@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
+bdrv_set_dirty(bs, sector_num, nb_sectors);
+
 max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors  0) {
 int ret;
-- 
1.9.3




Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-05 Thread Paolo Bonzini


On 05/05/2015 14:46, Fam Zheng wrote:
 Unsetting dirty globally with discard is not very correct. The discard may 
 zero
 out sectors (depending on can_write_zeroes_with_unmap), we should replicate
 this change to destinition side to make sure that the guest sees the same 
 data.
 
 Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
 doesn't expect unsetting of bits after current position.
 
 So let's do it the opposite way which fixes both problems: set the dirty bits
 if we are to discard it.

This is not enough, you also have to do the discard in block/mirror.c,
otherwise the destination image could even become fully provisioned!

Paolo

 Reported-by: wangxiaol...@ucloud.cn
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/io.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/block/io.c b/block/io.c
 index 1ce62c4..809688b 100644
 --- a/block/io.c
 +++ b/block/io.c
 @@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
 int64_t sector_num,
  return -EROFS;
  }
  
 -bdrv_reset_dirty(bs, sector_num, nb_sectors);
 -
  /* Do nothing if disabled.  */
  if (!(bs-open_flags  BDRV_O_UNMAP)) {
  return 0;
 @@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
 int64_t sector_num,
  return 0;
  }
  
 +bdrv_set_dirty(bs, sector_num, nb_sectors);
 +
  max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
  while (nb_sectors  0) {
  int ret;
 



Re: [Qemu-devel] [PATCH 1/4] block: Fix dirty bitmap in bdrv_co_discard

2015-05-05 Thread Fam Zheng
On Tue, 05/05 15:06, Paolo Bonzini wrote:
 
 
 On 05/05/2015 14:46, Fam Zheng wrote:
  Unsetting dirty globally with discard is not very correct. The discard may 
  zero
  out sectors (depending on can_write_zeroes_with_unmap), we should replicate
  this change to destinition side to make sure that the guest sees the same 
  data.
  
  Calling bdrv_reset_dirty also troubles mirror job because the hbitmap 
  iterator
  doesn't expect unsetting of bits after current position.
  
  So let's do it the opposite way which fixes both problems: set the dirty 
  bits
  if we are to discard it.
 
 This is not enough, you also have to do the discard in block/mirror.c,
 otherwise the destination image could even become fully provisioned!

I wasn't sure what if src and dest have different can_write_zeroes_with_unmap
value, but your argument is stronger.

I will add discard in mirror.  Besides that, do we need to compare the
can_write_zeroes_with_unmap?

Fam

 
 Paolo
 
  Reported-by: wangxiaol...@ucloud.cn
  Signed-off-by: Fam Zheng f...@redhat.com
  ---
   block/io.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/block/io.c b/block/io.c
  index 1ce62c4..809688b 100644
  --- a/block/io.c
  +++ b/block/io.c
  @@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState 
  *bs, int64_t sector_num,
   return -EROFS;
   }
   
  -bdrv_reset_dirty(bs, sector_num, nb_sectors);
  -
   /* Do nothing if disabled.  */
   if (!(bs-open_flags  BDRV_O_UNMAP)) {
   return 0;
  @@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState 
  *bs, int64_t sector_num,
   return 0;
   }
   
  +bdrv_set_dirty(bs, sector_num, nb_sectors);
  +
   max_discard = MIN_NON_ZERO(bs-bl.max_discard, 
  BDRV_REQUEST_MAX_SECTORS);
   while (nb_sectors  0) {
   int ret;