Re: [Qemu-block] [Qemu-devel] [PATCH COLO v3 10/14] util/hbitmap: Add an API to reset all set bits in hbitmap

2015-05-07 Thread John Snow


On 05/06/2015 10:20 PM, Wen Congyang wrote:
 On 05/02/2015 12:47 AM, John Snow wrote:


 On 04/03/2015 07:05 AM, Paolo Bonzini wrote:


 On 03/04/2015 12:01, Wen Congyang wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 ---
   include/qemu/hbitmap.h |  8 
   tests/test-hbitmap.c   | 39 +++
   util/hbitmap.c | 16 
   3 files changed, 63 insertions(+)

 diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
 index 550d7ce..95a55e4 100644
 --- a/include/qemu/hbitmap.h
 +++ b/include/qemu/hbitmap.h
 @@ -109,6 +109,14 @@ void hbitmap_set(HBitmap *hb, uint64_t start, 
 uint64_t count);
   void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);

   /**
 + * hbitmap_reset_all:
 + * @hb: HBitmap to operate on.
 + *
 + * Reset all bits in an HBitmap.
 + */
 +void hbitmap_reset_all(HBitmap *hb);
 +
 +/**
* hbitmap_get:
* @hb: HBitmap to operate on.
* @item: Bit to query (0-based).
 diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
 index 8c902f2..1f0078a 100644
 --- a/tests/test-hbitmap.c
 +++ b/tests/test-hbitmap.c
 @@ -11,6 +11,7 @@

   #include glib.h
   #include stdarg.h
 +#include string.h
   #include qemu/hbitmap.h

   #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
 @@ -143,6 +144,23 @@ static void hbitmap_test_reset(TestHBitmapData *data,
   }
   }

 +static void hbitmap_test_reset_all(TestHBitmapData *data)
 +{
 +size_t n;
 +
 +hbitmap_reset_all(data-hb);
 +
 +n = (data-size + BITS_PER_LONG - 1) / BITS_PER_LONG;
 +if (n == 0) {
 +n = 1;
 +}
 +memset(data-bits, 0, n * sizeof(unsigned long));
 +
 +if (data-granularity == 0) {
 +hbitmap_test_check(data, 0);
 +}
 +}
 +
   static void hbitmap_test_check_get(TestHBitmapData *data)
   {
   uint64_t count = 0;
 @@ -323,6 +341,26 @@ static void test_hbitmap_reset(TestHBitmapData *data,
   hbitmap_test_set(data, L3 / 2, L3);
   }

 +static void test_hbitmap_reset_all(TestHBitmapData *data,
 +   const void *unused)
 +{
 +hbitmap_test_init(data, L3 * 2, 0);
 +hbitmap_test_set(data, L1 - 1, L1 + 2);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, 0, L1 * 3);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, L2, L1);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, L2, L3 - L2 + 1);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, L3 - 1, 3);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, 0, L3 * 2);
 +hbitmap_test_reset_all(data);
 +hbitmap_test_set(data, L3 / 2, L3);
 +hbitmap_test_reset_all(data);
 +}
 +
   static void test_hbitmap_granularity(TestHBitmapData *data,
const void *unused)
   {
 @@ -394,6 +432,7 @@ int main(int argc, char **argv)
   hbitmap_test_add(/hbitmap/set/overlap, test_hbitmap_set_overlap);
   hbitmap_test_add(/hbitmap/reset/empty, test_hbitmap_reset_empty);
   hbitmap_test_add(/hbitmap/reset/general, test_hbitmap_reset);
 +hbitmap_test_add(/hbitmap/reset/all, test_hbitmap_reset_all);
   hbitmap_test_add(/hbitmap/granularity, test_hbitmap_granularity);
   g_test_run();

 diff --git a/util/hbitmap.c b/util/hbitmap.c
 index ab13971..acce93c 100644
 --- a/util/hbitmap.c
 +++ b/util/hbitmap.c
 @@ -353,6 +353,22 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, 
 uint64_t count)
   hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
   }

 +void hbitmap_reset_all(HBitmap *hb)
 +{
 +uint64_t size = hb-size;
 +unsigned int i;
 +
 +/* Same as hbitmap_alloc() except memset() */
 +for (i = HBITMAP_LEVELS; --i = 1; ) {
 +size = MAX((size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
 +memset(hb-levels[i], 0, size * sizeof(unsigned long));
 +}
 +

 For what it's worth, I recently added in a hb-sizes[i] cache to store the 
 size of each array so you don't have to recompute this all the time.
 
 Yes, will fix it in the next version.
 
 Thanks
 Wen Congyang
 

Since the reset stuff is useful all by itself, you can send that patch
by itself, CC me, and I'll review it.

You can update the existing call in block.c:

bdrv_clear_dirty_bitmap() {
hbitmap_reset(bitmap-bitmap, 0, bitmap-size);
}

to using your faster hbitmap_reset_all call.

Thanks,

--js


 +assert(size == 1);
 +hb-levels[0][0] = 1UL  (BITS_PER_LONG - 1);
 +hb-count = 0;
 +}
 +
   bool hbitmap_get(const HBitmap *hb, uint64_t item)
   {
   /* Compute position and bit in the last layer.  */


 Acked-by: Paolo Bonzini pbonz...@redhat.com

 .

 
 



Re: [Qemu-block] [PATCH 3/7] qcow2: use an LRU algorithm to replace entries from the L2 cache

2015-05-07 Thread Stefan Hajnoczi
On Wed, May 06, 2015 at 04:39:27PM +0300, Alberto Garcia wrote:
 The current algorithm to evict entries from the cache gives always
 preference to those in the lowest positions. As the size of the cache
 increases, the chances of the later elements of being removed decrease
 exponentially.
 
 In a scenario with random I/O and lots of cache misses, entries in
 positions 8 and higher are rarely (if ever) evicted. This can be seen
 even with the default cache size, but with larger caches the problem
 becomes more obvious.
 
 Using an LRU algorithm makes the chances of being removed from the
 cache independent from the position.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block/qcow2-cache.c | 31 +++
  1 file changed, 15 insertions(+), 16 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpaKPHOt8X3o.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 6/7] qcow2: make qcow2_cache_put() a void function

2015-05-07 Thread Stefan Hajnoczi
On Wed, May 06, 2015 at 04:39:30PM +0300, Alberto Garcia wrote:
 This function never receives an invalid table pointer, so we can make
 it void and remove all the error checking code.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block/qcow2-cache.c|  7 +--
  block/qcow2-cluster.c  | 50 
 ++
  block/qcow2-refcount.c | 29 +
  block/qcow2.h  |  2 +-
  4 files changed, 17 insertions(+), 71 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgp5d0sh16YYy.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 7/7] qcow2: style fixes in qcow2-cache.c

2015-05-07 Thread Stefan Hajnoczi
On Wed, May 06, 2015 at 04:39:31PM +0300, Alberto Garcia wrote:
 Fix pointer declaration to make it consistent with the rest of the
 code.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block/qcow2-cache.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpdlnUS45yrg.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-07 Thread Paolo Bonzini


On 06/05/2015 19:23, Max Reitz wrote:
 The guest sees whatever has been written into reply-error, and that
 field hasn't been written by this function in that case. It has been
 written by nbd_receive_reply() in nbd.c, and that value comes directly
 from the server. In case of qemu-nbd being the server, a write beyond
 the EOF should be caught by blk_check_byte_request() in
 block/block-backend.c, which returns -EIO. So that's where the EIO comes
 from.

Fair enough.  This makes sense, but then we have to create ENOSPC elsewhere.

 I don't know whether this EIO is subsequently converted to ENOSPC
 because of werror=enospc, but considering that
 https://bugzilla.redhat.com/show_bug.cgi?id=1090713 did not override
 werror, it doesn't look like it.

No, it shouldn't indeed.

Could alloc_clusters_noref do bdrv_truncate and return ENOSPC if it
fails?  That's how for example qcow and vhdx work.  vdi has the same
problem.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/6] block: Mirror discarded sectors

2015-05-07 Thread Fam Zheng
On Thu, 05/07 14:20, Stefan Hajnoczi wrote:
 On Wed, May 06, 2015 at 12:52:02PM +0800, Fam Zheng wrote:
  v2: Fix typo and add Eric's rev-by in patch 3.
  Add patch 1 to discard target in mirror job. (Paolo)
  Add patch 6 to improve iotests.wait_ready. (John)
  
  This fixes the mirror assert failure reported by wangxiaolong:
  
  https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg04458.html
  
  The direct cause is that hbitmap code couldn't handle unset of bits *after*
  iterator's current position. We could fix that, but the bdrv_reset_dirty() 
  call
  is more questionable:
  
  Before, if guest discarded some sectors during migration, it could see
  different data after moving to dest side, depending on block backends of the
  src and the dest. This is IMO worse than mirroring the actual reading as 
  done
  in this series, because we don't know what the guest is doing.
  
  For example if a guest first issues WRITE SAME to wipe out the area then 
  issues
  UNMAP to discard it, just to get rid of some sensitive data completely, we 
  may
  miss both operations and leave stale data on dest image.
  
  
  Fam Zheng (6):
mirror: Discard target sectors if not allocated at source side
block: Fix dirty bitmap in bdrv_co_discard
block: Remove bdrv_reset_dirty
qemu-iotests: Make block job methods common
qemu-iotests: Add test case for mirror with unmap
iotests: Use event_wait in wait_ready
  
   block.c   | 12 
   block/io.c|  4 +--
   block/mirror.c| 12 ++--
   include/block/block_int.h |  2 --
   tests/qemu-iotests/041| 66 
  ++-
   tests/qemu-iotests/131| 59 ++
   tests/qemu-iotests/131.out|  5 
   tests/qemu-iotests/group  |  1 +
   tests/qemu-iotests/iotests.py | 23 +++
   9 files changed, 115 insertions(+), 69 deletions(-)
   create mode 100644 tests/qemu-iotests/131
   create mode 100644 tests/qemu-iotests/131.out
 
 CCing Jeff Cody for block jobs

Thanks,

Note that in the discussion of v1 Paolo pointed out that discard on target is
not safe in some cases, I need to fix that in v3.

Fam



Re: [Qemu-block] [PATCH v2 0/6] block: Mirror discarded sectors

2015-05-07 Thread Stefan Hajnoczi
On Wed, May 06, 2015 at 12:52:02PM +0800, Fam Zheng wrote:
 v2: Fix typo and add Eric's rev-by in patch 3.
 Add patch 1 to discard target in mirror job. (Paolo)
 Add patch 6 to improve iotests.wait_ready. (John)
 
 This fixes the mirror assert failure reported by wangxiaolong:
 
 https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg04458.html
 
 The direct cause is that hbitmap code couldn't handle unset of bits *after*
 iterator's current position. We could fix that, but the bdrv_reset_dirty() 
 call
 is more questionable:
 
 Before, if guest discarded some sectors during migration, it could see
 different data after moving to dest side, depending on block backends of the
 src and the dest. This is IMO worse than mirroring the actual reading as done
 in this series, because we don't know what the guest is doing.
 
 For example if a guest first issues WRITE SAME to wipe out the area then 
 issues
 UNMAP to discard it, just to get rid of some sensitive data completely, we may
 miss both operations and leave stale data on dest image.
 
 
 Fam Zheng (6):
   mirror: Discard target sectors if not allocated at source side
   block: Fix dirty bitmap in bdrv_co_discard
   block: Remove bdrv_reset_dirty
   qemu-iotests: Make block job methods common
   qemu-iotests: Add test case for mirror with unmap
   iotests: Use event_wait in wait_ready
 
  block.c   | 12 
  block/io.c|  4 +--
  block/mirror.c| 12 ++--
  include/block/block_int.h |  2 --
  tests/qemu-iotests/041| 66 
 ++-
  tests/qemu-iotests/131| 59 ++
  tests/qemu-iotests/131.out|  5 
  tests/qemu-iotests/group  |  1 +
  tests/qemu-iotests/iotests.py | 23 +++
  9 files changed, 115 insertions(+), 69 deletions(-)
  create mode 100644 tests/qemu-iotests/131
  create mode 100644 tests/qemu-iotests/131.out

CCing Jeff Cody for block jobs


pgpod9AO_m2Gc.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-07 Thread Kevin Wolf
Am 07.05.2015 um 14:47 hat Paolo Bonzini geschrieben:
 
 
 On 07/05/2015 14:29, Kevin Wolf wrote:
   No, it shouldn't indeed.
   
   Could alloc_clusters_noref do bdrv_truncate and return ENOSPC if it
   fails?  That's how for example qcow and vhdx work.  vdi has the same
   problem.
  
  If you want NBD to return -ENOSPC for writes after EOF, change the
  nbd block driver to do just that. There's no reason to add additional
  overhead to qcow2 (even over raw-posix) for that.
 
 Does ENOSPC over LVM (dm-linear) work at all, and who generates the
 ENOSPC there?

The LVM use case is what oVirt uses, so I'm pretty sure that it works.
I'm now sure who generates the ENOSPC, but it's not qemu anyway. If I
had to guess, I'd say that the kernel block layer might just forbid
writing after EOF for any block device.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH] nvme: support NVME_VOLATILE_WRITE_CACHE feature

2015-05-07 Thread Kevin Wolf
[ Cc: qemu-block ]

Am 30.04.2015 um 16:03 hat Keith Busch geschrieben:
 On Thu, 30 Apr 2015, Christoph Hellwig wrote:
 The SCSI emulation in the Linux NVMe driver really wants to know
 if a device has a volatile write cache.  Given that qemu has moved
 away from a model where we report the backing store WCE bit to
 one where the WCE bit is supposed to be part of the migratable
 guest-visible state we always return 1 here.
 
 Thanks, this fix was long overdue and already incorporated in my tree. I
 really need to get my act together for a legit pull request with all the
 other 1.0, 1.1 and 1.2 features.
 
 Acked-by: Keith Busch keith.bu...@intel.com

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-07 Thread Paolo Bonzini


On 07/05/2015 16:34, Kevin Wolf wrote:
 Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:


 On 07/05/2015 16:07, Kevin Wolf wrote:
 This is not right for two reasons: The first is that this is
 BlockBackend code

 I think it would take effect for the qemu-nbd case though.
 
 Oh, you want to change the server code rather than the client?

Yes.

 Wait... Are you saying that NBD sends a (platform specific) errno value
 over the network? :-/

Yes. :/  That said, at least the error codes that Linux places in
/usr/include/asm/errno-base.h seem to be pretty much standard---at least
Windows and most Unices share them---with the exception of EAGAIN.

I'll send a patch to NBD to standardize the set of error codes that it
sends.

 In theory, what error code the NBD server needs to send should be
 specified by the NBD protocol. Am I right to assume that it doesn't do
 that?

Nope.

 In any case, I'm not sure whether qemu's internal error code
 should change just for NBD. Producing the right error code for the
 protocol is the job of nbd_co_receive_request().

Ok, so it shouldn't reach blk_check_request at all.  But then, we should
aim at making blk_check_request's checks assertions.

 and it wouldn't even take effect for the qcow2 case
 where we're writing past EOF only on the protocol layer. The second is
 that -ENOSPC is only for writes and not for reads.

 This is right.

 Reads in the kernel return 0, but in QEMU we do not want that.  The code
 currently returns -EIO, but perhaps -EINVAL is a better match.  It also
 happens to be what Linux returns for discards.
 
 Perhaps it is, yes. It shouldn't make a difference for guests anyway.
 (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
 most likely not what we want.)

Yes, we want the check duplicated in all BlockBackend users.  Most of
them already do it, see the work that Markus did last year I think.

Paolo



Re: [Qemu-block] [PATCH 3/7] qcow2: use an LRU algorithm to replace entries from the L2 cache

2015-05-07 Thread Alberto Garcia
On Thu 07 May 2015 04:55:21 PM CEST, Eric Blake wrote:

  /* Give the table some hits for the start so that it won't be replaced
   * immediately. The number 32 is completely arbitrary. */
 -c-entries[i].cache_hits = 32;
  c-entries[i].offset = offset;

 The comment is now dead.

Oh! Feel free to remove it when committing the patch, otherwise I can
send a new series.

Berto



Re: [Qemu-block] [PATCH 3/7] qcow2: use an LRU algorithm to replace entries from the L2 cache

2015-05-07 Thread Eric Blake
On 05/06/2015 07:39 AM, Alberto Garcia wrote:
 The current algorithm to evict entries from the cache gives always
 preference to those in the lowest positions. As the size of the cache
 increases, the chances of the later elements of being removed decrease
 exponentially.
 
 In a scenario with random I/O and lots of cache misses, entries in
 positions 8 and higher are rarely (if ever) evicted. This can be seen
 even with the default cache size, but with larger caches the problem
 becomes more obvious.
 
 Using an LRU algorithm makes the chances of being removed from the
 cache independent from the position.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block/qcow2-cache.c | 31 +++
  1 file changed, 15 insertions(+), 16 deletions(-)
 

 @@ -318,12 +315,10 @@ static int qcow2_cache_do_get(BlockDriverState *bs, 
 Qcow2Cache *c,
  
  /* Give the table some hits for the start so that it won't be replaced
   * immediately. The number 32 is completely arbitrary. */
 -c-entries[i].cache_hits = 32;
  c-entries[i].offset = offset;

The comment is now dead.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-07 Thread Kevin Wolf
Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
 
 
 On 07/05/2015 16:07, Kevin Wolf wrote:
  This is not right for two reasons: The first is that this is
  BlockBackend code
 
 I think it would take effect for the qemu-nbd case though.

Oh, you want to change the server code rather than the client?

Wait... Are you saying that NBD sends a (platform specific) errno value
over the network? :-/

In theory, what error code the NBD server needs to send should be
specified by the NBD protocol. Am I right to assume that it doesn't do
that? In any case, I'm not sure whether qemu's internal error code
should change just for NBD. Producing the right error code for the
protocol is the job of nbd_co_receive_request().

  and it wouldn't even take effect for the qcow2 case
  where we're writing past EOF only on the protocol layer. The second is
  that -ENOSPC is only for writes and not for reads.
 
 This is right.
 
 Reads in the kernel return 0, but in QEMU we do not want that.  The code
 currently returns -EIO, but perhaps -EINVAL is a better match.  It also
 happens to be what Linux returns for discards.

Perhaps it is, yes. It shouldn't make a difference for guests anyway.
(Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
most likely not what we want.)

Kevin



[Qemu-block] [PATCH v4] block/vdi: Use bdrv_flush after metadata updates

2015-05-07 Thread Zhe Qiu
In reference to b0ad5a45...078a458e, metadata writes to 
qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes.

Only when write is successful that bdrv_flush is called.

Signed-off-by: Zhe Qiu phoea...@gmail.com
---
 block/vdi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/vdi.c b/block/vdi.c
index 7642ef3..dfe8ade 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -713,6 +713,9 @@ static int vdi_co_write(BlockDriverState *bs,
 logout(will write %u block map sectors starting from entry %u\n,
n_sectors, bmap_first);
 ret = bdrv_write(bs-file, offset, base, n_sectors);
+if (ret = 0) {
+ret = bdrv_flush(bs-file);
+}
 }
 
 return ret;
-- 
2.4.0




Re: [Qemu-block] [PATCH 2/7] qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()

2015-05-07 Thread Stefan Hajnoczi
On Wed, May 06, 2015 at 04:39:26PM +0300, Alberto Garcia wrote:
 Since all tables are now stored together, it is possible to obtain
 the position of a particular table directly from its address, so the
 operation becomes O(1).
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 ---
  block/qcow2-cache.c | 32 +++-
  1 file changed, 15 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpWU7F3fWhBP.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-07 Thread Kevin Wolf
Am 07.05.2015 um 14:20 hat Paolo Bonzini geschrieben:
 
 
 On 06/05/2015 19:23, Max Reitz wrote:
  The guest sees whatever has been written into reply-error, and that
  field hasn't been written by this function in that case. It has been
  written by nbd_receive_reply() in nbd.c, and that value comes directly
  from the server. In case of qemu-nbd being the server, a write beyond
  the EOF should be caught by blk_check_byte_request() in
  block/block-backend.c, which returns -EIO. So that's where the EIO comes
  from.
 
 Fair enough.  This makes sense, but then we have to create ENOSPC elsewhere.
 
  I don't know whether this EIO is subsequently converted to ENOSPC
  because of werror=enospc, but considering that
  https://bugzilla.redhat.com/show_bug.cgi?id=1090713 did not override
  werror, it doesn't look like it.
 
 No, it shouldn't indeed.
 
 Could alloc_clusters_noref do bdrv_truncate and return ENOSPC if it
 fails?  That's how for example qcow and vhdx work.  vdi has the same
 problem.

If you want NBD to return -ENOSPC for writes after EOF, change the
nbd block driver to do just that. There's no reason to add additional
overhead to qcow2 (even over raw-posix) for that.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v3 01/10] qapi: Add transaction support to block-dirty-bitmap operations

2015-05-07 Thread John Snow


On 05/07/2015 10:54 AM, Stefan Hajnoczi wrote:
 On Wed, Apr 22, 2015 at 08:04:44PM -0400, John Snow wrote:
 +static void block_dirty_bitmap_clear_prepare(BlkTransactionState
 *common, + Error
 **errp) +{ +BlockDirtyBitmapState *state =
 DO_UPCAST(BlockDirtyBitmapState, +
 common, common); +BlockDirtyBitmap *action; + +action =
 common-action-block_dirty_bitmap_clear; +state-bitmap =
 block_dirty_bitmap_lookup(action-node, +
 action-name, +
 state-bs, +
 state-aio_context, +
 errp); +if (!state-bitmap) { +return; +} + +
 if (bdrv_dirty_bitmap_frozen(state-bitmap)) { +
 error_setg(errp, Cannot modify a frozen bitmap); +
 return; +} else if
 (!bdrv_dirty_bitmap_enabled(state-bitmap)) { +
 error_setg(errp, Cannot clear a disabled bitmap); +
 return; +} + +/* AioContext is released in .clean() */ 
 +} + +static void
 block_dirty_bitmap_clear_commit(BlkTransactionState *common) +{ +
 BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, +
 common, common); +bdrv_clear_dirty_bitmap(state-bitmap); +}
 
 These semantics don't work in this example:
 
 [block-dirty-bitmap-clear, drive-backup]
 
 Since drive-backup starts the blockjob in .prepare() but 
 block-dirty-bitmap-clear only clears the bitmap in .commit() the
 order is wrong.
 
 .prepare() has to do something non-destructive, like stashing away
 the HBitmap and replacing it with an empty one.  Then .commit() can
 discard the old bitmap while .abort() can move the old bitmap back
 to undo the operation.
 
 Stefan
 

Hmm, that's sort of gross. That means that any transactional command
*ever* destined to be used with drive-backup in any conceivable way
needs to move a lot more of its action forward to .prepare().

That sort of defeats the premise of .prepare() and .commit(), no? And
all because drive-backup jumped the gun.

That's going to get hard to maintain as we add more transactions.

--js