Re: [Qemu-block] [PATCH v4 11/14] vmdk: Return extent's file in bdrv_get_block_status

2016-01-04 Thread Max Reitz
On 24.12.2015 06:50, Fam Zheng wrote:
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index f5a56fd..b60a5af 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1265,6 +1265,7 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>   0, 0);
>  qemu_co_mutex_unlock(>lock);
>  
> +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  switch (ret) {
>  case VMDK_ERROR:
>  ret = -EIO;
> @@ -1276,15 +1277,13 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>  ret = BDRV_BLOCK_ZERO;
>  break;
>  case VMDK_OK:
> -ret = BDRV_BLOCK_DATA;
> -if (extent->file == bs->file && !extent->compressed) {
> -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> -}
> -
> +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> +& BDRV_BLOCK_OFFSET_MASK;
> +*file = extent->file->bs;

What if the extent is compressed?

Max

>  break;
>  }
>  
> -index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  n = extent->cluster_sectors - index_in_cluster;
>  if (n > nb_sectors) {
>  n = nb_sectors;
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 01/14] block: Add "file" output parameter to block status query functions

2016-01-04 Thread Max Reitz
On 24.12.2015 06:50, Fam Zheng wrote:
> The added parameter can be used to return the BDS pointer which the
> valid offset is referring to. It's value should be ignored unless

*Its

> BDRV_BLOCK_OFFSET_VALID in ret is set.
> 
> Until block drivers fill in the right value, let's clear it explicitly
> right before calling .bdrv_get_block_status.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Fam Zheng 
> ---
>  block/io.c| 42 --
>  block/iscsi.c |  6 --
>  block/mirror.c|  3 ++-
>  block/parallels.c |  2 +-
>  block/qcow.c  |  2 +-
>  block/qcow2.c |  2 +-
>  block/qed.c   |  3 ++-
>  block/raw-posix.c |  3 ++-
>  block/raw_bsd.c   |  3 ++-
>  block/sheepdog.c  |  2 +-
>  block/vdi.c   |  2 +-
>  block/vmdk.c  |  2 +-
>  block/vpc.c   |  2 +-
>  block/vvfat.c |  2 +-
>  include/block/block.h |  6 --
>  include/block/block_int.h |  3 ++-
>  qemu-img.c|  7 +--
>  17 files changed, 59 insertions(+), 33 deletions(-)
> 

[...]

> diff --git a/include/block/block.h b/include/block/block.h
> index db8e096..70b4984 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h

The comment explaining BDRV_BLOCK_OFFSET_VALID should be changed
accordingly (you could also say "fixed", because apparently it wasn't
always bs->file; sometimes it was bs itself (in case of raw-posix, iscsi
and sheepdog)).

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 00/14] qemu-img map: Allow driver to return file of the allocated block

2016-01-04 Thread Max Reitz
On 24.12.2015 06:50, Fam Zheng wrote:
> v4: Rebase and resend, adding Eric's and Stefan's reviewed-by.
> 
> Fix one typo in patch 13.
> 
> Drop previous patch 14 for a later rework because it is not a hard
> requirement, but it is pending on Eric's QAPI-to-JSON visitor series:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03929.html
> 
> v3: Add Eric's rev-by in patches 6, 7, 13, 14.
> 12: New, split out from the previous 13.
> 12->13: Refactor "entry_mergable" from imp_map().
> Don't mess the merge conditions. [Paolo]
> Address Eric's comments:
> - Check has_foo before using foo.
> - Remove blank line between comments and definition in schema.
> - Use PRId64 instead of %ld.
> - Merge short lines.
> 
> v2: Add Eric's rev-by in patches 2, 4, 5.
> 01: Refering -> referring in commit message. [Eric]
> Recurse to "file" for sensible "zero" flag. [Paolo]
> 12: New. Make MapEntry a QAPI struct. [Paolo, Markus]
> 
> Original cover letter
> -
> 
> I stumbled upon this when looking at external bitmap formats.
> 
> Current "qemu-img map" command only displays filename if the data is allocated
> in bs (bs->file) itself, or in the backing chain. Otherwise, it displays an
> unfriendly error message:
> 
> $ qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/test.vmdk 1G
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> qemu-img: File contains external, encrypted or compressed clusters.
> 
> This can be improved. This series extends the .bdrv_co_get_block_status
> callback, to let block driver return the BDS of file; then updates all driver
> to implement it; and lastly, it changes qemu-img to use this information in
> "map" command:
> 
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> 0   0x4000  0   /tmp/test-flat.vmdk
> 
> $ qemu-img map --output json /tmp/test.vmdk
> [{"length": 1073741824, "start": 0, "zero": false, "offset": 0, "depth": 
> 0,
>   "file": "/tmp/test-flat.vmdk", "data": true}
> ]
> 
> Fam Zheng (14):
>   block: Add "file" output parameter to block status query functions
>   qcow: Assign bs->file->bs to file in qcow_co_get_block_status
>   qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status

Minor comment: I'd swap these two patches (2 and 3). Patch 1 breaks test
102, patch 3 fixes it again. It would be better to break it for as short
a time as possible.

Alternatively, in order not to break 102 at all, patch 1 would need to
leave the "if (bs->file &&" part of bdrv_co_get_block_status()
(@@ -1544,13 +1550,14 @@) as-is and change it only after the format
block drivers set *file.

Max

>   raw: Assign bs to file in raw_co_get_block_status
>   iscsi: Assign bs to file in iscsi_co_get_block_status
>   parallels: Assign bs->file->bs to file in
> parallels_co_get_block_status
>   qed: Assign bs->file->bs to file in bdrv_qed_co_get_block_status
>   sheepdog: Assign bs to file in sd_co_get_block_status
>   vdi: Assign bs->file->bs to file in vdi_co_get_block_status
>   vpc: Assign bs->file->bs to file in vpc_co_get_block_status
>   vmdk: Return extent's file in bdrv_get_block_status
>   qemu-img: In "map", use the returned "file" from bdrv_get_block_status
>   qemu-img: Make MapEntry a QAPI struct
>   iotests: Add "qemu-img map" test for VMDK extents
> 
>  block/io.c | 42 -
>  block/iscsi.c  |  9 --
>  block/mirror.c |  3 +-
>  block/parallels.c  |  3 +-
>  block/qcow.c   |  3 +-
>  block/qcow2.c  |  3 +-
>  block/qed.c|  6 +++-
>  block/raw-posix.c  |  4 ++-
>  block/raw_bsd.c|  4 ++-
>  block/sheepdog.c   |  5 ++-
>  block/vdi.c|  3 +-
>  block/vmdk.c   | 13 
>  block/vpc.c|  4 ++-
>  block/vvfat.c  |  2 +-
>  include/block/block.h  |  6 ++--
>  include/block/block_int.h  |  3 +-
>  qapi/block-core.json   | 27 
>  qemu-img.c | 78 
> --
>  tests/qemu-iotests/059 | 10 ++
>  tests/qemu-iotests/059.out | 38 ++
>  20 files changed, 198 insertions(+), 68 deletions(-)
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/6] expose floppy drive geometry and CMOS type

2016-01-04 Thread Michael S. Tsirkin
On Mon, Jan 04, 2016 at 03:44:42PM -0500, John Snow wrote:
> 
> 
> On 12/30/2015 03:11 PM, Roman Kagan wrote:
> > Make it possible to query the geometry and the CMOS type of a floppy
> > drive outside of the respective source files.
> > 
> > It will be useful, in particular, when dynamically building ACPI tables,
> > and will allow to properly populate the corresponding ACPI objects and
> > thus enable BIOS-less systems to access the floppy drives.
> > 
> > Signed-off-by: Roman Kagan 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Eduardo Habkost 
> > Cc: Igor Mammedov 
> > Cc: John Snow 
> > Cc: Kevin Wolf 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: qemu-block@nongnu.org
> > Cc: qemu-sta...@nongnu.org
> > ---
> > no changes since v4
> > 
> > changes since v3:
> >  - split out into a separate patch to faciliate review
> > 
> >  hw/block/fdc.c | 11 +++
> >  hw/i386/pc.c   |  2 +-
> >  include/hw/block/fdc.h |  2 ++
> >  include/hw/i386/pc.h   |  1 +
> >  4 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> > index 4292ece..c858c5f 100644
> > --- a/hw/block/fdc.c
> > +++ b/hw/block/fdc.c
> > @@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, 
> > int i)
> >  return isa->state.drives[i].drive;
> >  }
> >  
> > +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> > +uint8_t *heads, uint8_t *sectors)
> > +{
> > +FDCtrlISABus *isa = ISA_FDC(fdc);
> > +FDrive *drv = >state.drives[i];
> > +
> > +*cylinders = drv->max_track;
> > +*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1;
> > +*sectors = drv->last_sect;
> > +}
> > +
> >  static const VMStateDescription vmstate_isa_fdc ={
> >  .name = "fdc",
> >  .version_id = 2,
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index c36b8cf..99fab83 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int 
> > level)
> >  
> >  #define REG_EQUIPMENT_BYTE  0x14
> >  
> > -static int cmos_get_fd_drive_type(FDriveType fd0)
> > +int cmos_get_fd_drive_type(FDriveType fd0)
> >  {
> >  int val;
> >  
> > diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
> > index d48b2f8..adaf3dc 100644
> > --- a/include/hw/block/fdc.h
> > +++ b/include/hw/block/fdc.h
> > @@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
> > DriveInfo **fds, qemu_irq *fdc_tc);
> >  
> >  FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
> > +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> > +uint8_t *heads, uint8_t *sectors);
> >  
> >  #endif
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index 819..d044a9a 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
> >  void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
> >  
> >  ISADevice *pc_find_fdc0(void);
> > +int cmos_get_fd_drive_type(FDriveType fd0);
> >  
> >  /* acpi_piix.c */
> >  
> > 
> 
> Patches 1,4:
> 
> Reviewed-by: John Snow 
> 
> Aside: Why did they have you split out the test changes to be separate
> from the code? Doesn't that introduce commits where the tests now fail?
> 
> --js

It's only a warning not a failure.



Re: [Qemu-block] [PATCH v4 4/5] qmp: Add blockdev-mirror command

2016-01-04 Thread Max Reitz
On 24.12.2015 05:45, Fam Zheng wrote:
> This will start a mirror job from a named device to another named
> device, its relation with drive-mirror is similar with blockdev-backup
> to drive-backup.
> 
> In blockdev-mirror, the target node should be prepared by blockdev-add,
> which will be responsible for assigning a name to the new node, so
> we don't have 'node-name' parameter.
> 
> Signed-off-by: Fam Zheng 
> Acked-by: Markus Armbruster 
> ---
>  blockdev.c   | 62 
> 
>  qapi/block-core.json | 48 
>  qmp-commands.hx  | 50 +-
>  3 files changed, 159 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/6] expose floppy drive geometry and CMOS type

2016-01-04 Thread John Snow


On 12/30/2015 03:11 PM, Roman Kagan wrote:
> Make it possible to query the geometry and the CMOS type of a floppy
> drive outside of the respective source files.
> 
> It will be useful, in particular, when dynamically building ACPI tables,
> and will allow to properly populate the corresponding ACPI objects and
> thus enable BIOS-less systems to access the floppy drives.
> 
> Signed-off-by: Roman Kagan 
> Cc: "Michael S. Tsirkin" 
> Cc: Eduardo Habkost 
> Cc: Igor Mammedov 
> Cc: John Snow 
> Cc: Kevin Wolf 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: qemu-block@nongnu.org
> Cc: qemu-sta...@nongnu.org
> ---
> no changes since v4
> 
> changes since v3:
>  - split out into a separate patch to faciliate review
> 
>  hw/block/fdc.c | 11 +++
>  hw/i386/pc.c   |  2 +-
>  include/hw/block/fdc.h |  2 ++
>  include/hw/i386/pc.h   |  1 +
>  4 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 4292ece..c858c5f 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2408,6 +2408,17 @@ FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int 
> i)
>  return isa->state.drives[i].drive;
>  }
>  
> +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> +uint8_t *heads, uint8_t *sectors)
> +{
> +FDCtrlISABus *isa = ISA_FDC(fdc);
> +FDrive *drv = >state.drives[i];
> +
> +*cylinders = drv->max_track;
> +*heads = (drv->flags & FDISK_DBL_SIDES) ? 2 : 1;
> +*sectors = drv->last_sect;
> +}
> +
>  static const VMStateDescription vmstate_isa_fdc ={
>  .name = "fdc",
>  .version_id = 2,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index c36b8cf..99fab83 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -199,7 +199,7 @@ static void pic_irq_request(void *opaque, int irq, int 
> level)
>  
>  #define REG_EQUIPMENT_BYTE  0x14
>  
> -static int cmos_get_fd_drive_type(FDriveType fd0)
> +int cmos_get_fd_drive_type(FDriveType fd0)
>  {
>  int val;
>  
> diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
> index d48b2f8..adaf3dc 100644
> --- a/include/hw/block/fdc.h
> +++ b/include/hw/block/fdc.h
> @@ -22,5 +22,7 @@ void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
> DriveInfo **fds, qemu_irq *fdc_tc);
>  
>  FDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
> +void isa_fdc_get_drive_geometry(ISADevice *fdc, int i, uint8_t *cylinders,
> +uint8_t *heads, uint8_t *sectors);
>  
>  #endif
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 819..d044a9a 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -268,6 +268,7 @@ typedef void (*cpu_set_smm_t)(int smm, void *arg);
>  void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
>  
>  ISADevice *pc_find_fdc0(void);
> +int cmos_get_fd_drive_type(FDriveType fd0);
>  
>  /* acpi_piix.c */
>  
> 

Patches 1,4:

Reviewed-by: John Snow 

Aside: Why did they have you split out the test changes to be separate
from the code? Doesn't that introduce commits where the tests now fail?

--js



[Qemu-block] [PATCH 05/13] block: Hide HBitmap in block dirty bitmap interface

2016-01-04 Thread Fam Zheng
HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block.c.

Two current users are converted too.

Signed-off-by: Fam Zheng 
---
 block/backup.c   | 14 --
 block/dirty-bitmap.c | 36 +++-
 block/mirror.c   | 14 --
 include/block/dirty-bitmap.h |  7 +--
 include/qemu/typedefs.h  |  1 +
 5 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 56ddec0..2388039 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -326,14 +326,14 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 int64_t end;
 int64_t last_cluster = -1;
 BlockDriverState *bs = job->common.bs;
-HBitmapIter hbi;
+BdrvDirtyBitmapIter *dbi;
 
 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
 clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
-bdrv_dirty_iter_init(job->sync_bitmap, );
+dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
 
 /* Find the next dirty sector(s) */
-while ((sector = hbitmap_iter_next()) != -1) {
+while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
 cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
 
 /* Fake progress updates for any clusters we skipped */
@@ -345,7 +345,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
 do {
 if (yield_and_check(job)) {
-return ret;
+goto out;
 }
 ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
 BACKUP_SECTORS_PER_CLUSTER, _is_read,
@@ -353,7 +353,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 if ((ret < 0) &&
 backup_error_action(job, error_is_read, -ret) ==
 BLOCK_ERROR_ACTION_REPORT) {
-return ret;
+goto out;
 }
 } while (ret < 0);
 }
@@ -361,7 +361,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < BACKUP_CLUSTER_SIZE) {
-bdrv_set_dirty_iter(, cluster * BACKUP_SECTORS_PER_CLUSTER);
+bdrv_set_dirty_iter(dbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
 }
 
 last_cluster = cluster - 1;
@@ -373,6 +373,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 job->common.offset += ((end - last_cluster - 1) * BACKUP_CLUSTER_SIZE);
 }
 
+out:
+bdrv_dirty_iter_free(dbi);
 return ret;
 }
 
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 7924c38..53cf88d 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -41,9 +41,15 @@ struct BdrvDirtyBitmap {
 char *name; /* Optional non-empty unique ID */
 int64_t size;   /* Size of the bitmap (Number of sectors) */
 bool disabled;  /* Bitmap is read-only */
+int active_iterators;   /* How many iterators are active */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
+struct BdrvDirtyBitmapIter {
+HBitmapIter hbi;
+BdrvDirtyBitmap *bitmap;
+};
+
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
 {
 BdrvDirtyBitmap *bm;
@@ -221,6 +227,7 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
 BdrvDirtyBitmap *bm, *next;
 QLIST_FOREACH_SAFE(bm, >dirty_bitmaps, list, next) {
 if (bm == bitmap) {
+assert(!bitmap->active_iterators);
 assert(!bdrv_dirty_bitmap_frozen(bm));
 QLIST_REMOVE(bitmap, list);
 hbitmap_free(bitmap->bitmap);
@@ -299,9 +306,29 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }
 
-void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
+BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
+ uint64_t first_sector)
 {
-hbitmap_iter_init(hbi, bitmap->bitmap, 0);
+BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
+hbitmap_iter_init(>hbi, bitmap->bitmap, first_sector);
+iter->bitmap = bitmap;
+bitmap->active_iterators++;
+return iter;
+}
+
+void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
+{
+if (!iter) {

[Qemu-block] [PATCH 12/13] block: BdrvDirtyBitmap serialization interface

2016-01-04 Thread Fam Zheng
From: Vladimir Sementsov-Ogievskiy 

Several functions to provide necessary access to BdrvDirtyBitmap for
block-migration.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
[Add the "finish" parameters. - Fam]
Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 37 +
 include/block/dirty-bitmap.h | 14 ++
 2 files changed, 51 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 60ee965..d524549 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -428,6 +428,43 @@ void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap *in)
 hbitmap_free(tmp);
 }
 
+uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count)
+{
+return hbitmap_serialization_size(bitmap->bitmap, start, count);
+}
+
+uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap)
+{
+return hbitmap_serialization_granularity(bitmap->bitmap);
+}
+
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+  uint8_t *buf, uint64_t start,
+  uint64_t count)
+{
+hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
+}
+
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+uint8_t *buf, uint64_t start,
+uint64_t count, bool finish)
+{
+hbitmap_deserialize_part(bitmap->bitmap, buf, start, count, finish);
+}
+
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count,
+  bool finish)
+{
+hbitmap_deserialize_zeroes(bitmap->bitmap, start, count, finish);
+}
+
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
+{
+hbitmap_deserialize_finish(bitmap->bitmap);
+}
+
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
 int nr_sectors)
 {
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index e36efb6..d4a8bf1 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -53,4 +53,18 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t 
sector_num);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
 
+uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count);
+uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap);
+void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
+  uint8_t *buf, uint64_t start,
+  uint64_t count);
+void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
+uint8_t *buf, uint64_t start,
+uint64_t count, bool finish);
+void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
+  uint64_t start, uint64_t count,
+  bool finish);
+void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
+
 #endif
-- 
2.4.3




[Qemu-block] [PATCH 03/13] block: Move block dirty bitmap code to separate files

2016-01-04 Thread Fam Zheng
The only change is making bdrv_dirty_bitmap_truncate public. It is used in
block.c.

Signed-off-by: Fam Zheng 
---
 block.c  | 339 ---
 block/Makefile.objs  |   2 +-
 block/dirty-bitmap.c | 366 +++
 include/block/block.h|  37 +
 include/block/dirty-bitmap.h |  42 +
 5 files changed, 410 insertions(+), 376 deletions(-)
 create mode 100644 block/dirty-bitmap.c
 create mode 100644 include/block/dirty-bitmap.h

diff --git a/block.c b/block.c
index 411edbf..b544190 100644
--- a/block.c
+++ b/block.c
@@ -55,23 +55,6 @@
 #include 
 #endif
 
-/**
- * A BdrvDirtyBitmap can be in three possible states:
- * (1) successor is NULL and disabled is false: full r/w mode
- * (2) successor is NULL and disabled is true: read only mode ("disabled")
- * (3) successor is set: frozen mode.
- * A frozen bitmap cannot be renamed, deleted, anonymized, cleared, set,
- * or enabled. A frozen bitmap can only abdicate() or reclaim().
- */
-struct BdrvDirtyBitmap {
-HBitmap *bitmap;/* Dirty sector bitmap implementation */
-BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
-char *name; /* Optional non-empty unique ID */
-int64_t size;   /* Size of the bitmap (Number of sectors) */
-bool disabled;  /* Bitmap is read-only */
-QLIST_ENTRY(BdrvDirtyBitmap) list;
-};
-
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
 struct BdrvStates bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states);
@@ -87,7 +70,6 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const 
char *filename,
  BlockDriverState *parent,
  const BdrvChildRole *child_role, Error **errp);
 
-static void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -3375,327 +3357,6 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked)
 }
 }
 
-BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
-{
-BdrvDirtyBitmap *bm;
-
-assert(name);
-QLIST_FOREACH(bm, >dirty_bitmaps, list) {
-if (bm->name && !strcmp(name, bm->name)) {
-return bm;
-}
-}
-return NULL;
-}
-
-void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
-{
-assert(!bdrv_dirty_bitmap_frozen(bitmap));
-g_free(bitmap->name);
-bitmap->name = NULL;
-}
-
-BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
-  uint32_t granularity,
-  const char *name,
-  Error **errp)
-{
-int64_t bitmap_size;
-BdrvDirtyBitmap *bitmap;
-uint32_t sector_granularity;
-
-assert((granularity & (granularity - 1)) == 0);
-
-if (name && bdrv_find_dirty_bitmap(bs, name)) {
-error_setg(errp, "Bitmap already exists: %s", name);
-return NULL;
-}
-sector_granularity = granularity >> BDRV_SECTOR_BITS;
-assert(sector_granularity);
-bitmap_size = bdrv_nb_sectors(bs);
-if (bitmap_size < 0) {
-error_setg_errno(errp, -bitmap_size, "could not get length of device");
-errno = -bitmap_size;
-return NULL;
-}
-bitmap = g_new0(BdrvDirtyBitmap, 1);
-bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
-bitmap->size = bitmap_size;
-bitmap->name = g_strdup(name);
-bitmap->disabled = false;
-QLIST_INSERT_HEAD(>dirty_bitmaps, bitmap, list);
-return bitmap;
-}
-
-bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
-{
-return bitmap->successor;
-}
-
-bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
-{
-return !(bitmap->disabled || bitmap->successor);
-}
-
-DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap)
-{
-if (bdrv_dirty_bitmap_frozen(bitmap)) {
-return DIRTY_BITMAP_STATUS_FROZEN;
-} else if (!bdrv_dirty_bitmap_enabled(bitmap)) {
-return DIRTY_BITMAP_STATUS_DISABLED;
-} else {
-return DIRTY_BITMAP_STATUS_ACTIVE;
-}
-}
-
-/**
- * Create a successor bitmap destined to replace this bitmap after an 
operation.
- * Requires that the bitmap is not frozen and has no successor.
- */
-int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, Error **errp)
-{
-uint64_t granularity;
-BdrvDirtyBitmap *child;
-
-if (bdrv_dirty_bitmap_frozen(bitmap)) {
-error_setg(errp, "Cannot create a successor for a bitmap that is "
-   "currently frozen");
-return -1;
-}
-assert(!bitmap->successor);
-
-/* Create an anonymous successor */
-granularity = bdrv_dirty_bitmap_granularity(bitmap);
-child = 

[Qemu-block] [PATCH 11/13] hbitmap: serialization

2016-01-04 Thread Fam Zheng
From: Vladimir Sementsov-Ogievskiy 

Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
saved to linear sequence of bits independently of endianness and bitmap
array element (unsigned long) size. Therefore Little Endian is chosen.

These functions are appropriate for dirty bitmap migration, restoring
the bitmap in several steps is available. To save performance, every
step writes only the last level of the bitmap. All other levels are
restored by hbitmap_deserialize_finish() as a last step of restoring.
So, HBitmap is inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
[Fix left shift operand to 1UL; add "finish" parameter. - Fam]
Signed-off-by: Fam Zheng 
---
 include/qemu/hbitmap.h |  76 +++
 util/hbitmap.c | 136 +
 2 files changed, 212 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index ed672e7..3414113 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -149,6 +149,82 @@ void hbitmap_reset_all(HBitmap *hb);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Grunularity of serialization chunks, used by other serializetion functions.
+ * For every chunk:
+ * 1. Chunk start should be aligned to this granularity.
+ * 2. Chunk size should be aligned too, except for last chunk (for which
+ *  start + count == hb->size)
+ */
+uint64_t hbitmap_serialization_granularity(const HBitmap *hb);
+
+/**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Return amount of bytes hbitmap_(de)serialize_part needs
+ */
+uint64_t hbitmap_serialization_size(const HBitmap *hb,
+uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_serialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to store serialized bitmap.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
+ * independently of endianness and size of HBitmap level array elements
+ */
+void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
+uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_deserialize_part
+ * @hb: HBitmap to operate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ * @finish: Whether to call hbitmap_deserialize_finish automatically.
+ *
+ * Restores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_serialize_part.
+ *
+ * if @finish is false, caller must call hbitmap_serialize_finish before using
+ * the bitmap.
+ */
+void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
+  uint64_t start, uint64_t count,
+  bool finish);
+
+/**
+ * hbitmap_deserialize_zeroes
+ * @hb: HBitmap to operate on.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ * @finish: Whether to call hbitmap_deserialize_finish automatically.
+ *
+ * Same as hbitmap_serialize_part, but fills the bitmap with zeroes.
+ */
+void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count,
+bool finish);
+
+/**
+ * hbitmap_deserialize_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all HBitmap
+ * layers are restored here.
+ */
+void hbitmap_deserialize_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 55d3182..ab8cd81 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -397,6 +397,142 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
 return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
 }
 
+uint64_t hbitmap_serialization_granularity(const HBitmap *hb)
+{
+return BITS_PER_LONG << hb->granularity;
+}
+
+/* serilization chunk start should be aligned to serialization granularity.
+ * Serilization chunk size scould be aligned to serialization granularity too,
+ * except for last chunk.
+ */
+static void serialization_chunk(const HBitmap *hb,
+uint64_t start, uint64_t count,
+unsigned long **first_el, size_t *el_count)
+{
+uint64_t last = start + count - 1;
+uint64_t gran = hbitmap_serialization_granularity(hb);
+
+assert(((start + count - 1) >> hb->granularity) < hb->size);
+assert((start & (gran - 1)) == 0);
+if ((last >> hb->granularity) != hb->size - 1) {
+assert((count & (gran - 1)) == 0);
+}
+
+start = (start >> hb->granularity) >> BITS_PER_LEVEL;
+

[Qemu-block] [PATCH 13/13] tests: Add test code for hbitmap serialization

2016-01-04 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/test-hbitmap.c | 139 +++
 1 file changed, 139 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 19136e7..4f61961 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include "qemu/hbitmap.h"
+#include "qemu/bitmap.h"
 #include "block/block.h"
 
 #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
@@ -737,6 +738,16 @@ static void test_hbitmap_meta_one(TestHBitmapData *data, 
const void *unused)
 }
 }
 
+static void test_hbitmap_serialize_granularity(TestHBitmapData *data,
+   const void *unused)
+{
+int r;
+
+hbitmap_test_init(data, L3 * 2, 3);
+r = hbitmap_serialization_granularity(data->hb);
+g_assert_cmpint(r, ==, BITS_PER_LONG << 3);
+}
+
 static void test_hbitmap_meta_zero(TestHBitmapData *data, const void *unused)
 {
 hbitmap_test_init_meta(data, 0, 0, 1);
@@ -744,6 +755,125 @@ static void test_hbitmap_meta_zero(TestHBitmapData *data, 
const void *unused)
 hbitmap_check_meta(data, 0, 0);
 }
 
+static void hbitmap_test_serialize_range(TestHBitmapData *data,
+ uint8_t *buf, size_t buf_size,
+ uint64_t pos, uint64_t count)
+{
+size_t i;
+
+assert(hbitmap_granularity(data->hb) == 0);
+hbitmap_reset_all(data->hb);
+memset(buf, 0, buf_size);
+if (count) {
+hbitmap_set(data->hb, pos, count);
+}
+hbitmap_serialize_part(data->hb, buf, 0, data->size);
+for (i = 0; i < data->size; i++) {
+int is_set = test_bit(i, (unsigned long *)buf);
+if (i >= pos && i < pos + count) {
+g_assert(is_set);
+} else {
+g_assert(!is_set);
+}
+}
+hbitmap_reset_all(data->hb);
+hbitmap_deserialize_part(data->hb, buf, 0, data->size, true);
+
+for (i = 0; i < data->size; i++) {
+int is_set = hbitmap_get(data->hb, i);
+if (i >= pos && i < pos + count) {
+g_assert(is_set);
+} else {
+g_assert(!is_set);
+}
+}
+}
+
+static void test_hbitmap_serialize_basic(TestHBitmapData *data,
+ const void *unused)
+{
+int i, j;
+size_t buf_size;
+uint8_t *buf;
+uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
+int num_positions = sizeof(positions) / sizeof(positions[0]);
+
+hbitmap_test_init(data, L3, 0);
+buf_size = hbitmap_serialization_size(data->hb, 0, data->size);
+buf = g_malloc0(buf_size);
+
+for (i = 0; i < num_positions; i++) {
+for (j = 0; j < num_positions; j++) {
+hbitmap_test_serialize_range(data, buf, buf_size,
+ positions[i],
+ MIN(positions[j], L3 - positions[i]));
+}
+}
+
+g_free(buf);
+}
+
+static void test_hbitmap_serialize_part(TestHBitmapData *data,
+const void *unused)
+{
+int i, j, k;
+size_t buf_size;
+uint8_t *buf;
+uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
+int num_positions = sizeof(positions) / sizeof(positions[0]);
+
+hbitmap_test_init(data, L3, 0);
+buf_size = L2;
+buf = g_malloc0(buf_size);
+
+for (i = 0; i < num_positions; i++) {
+hbitmap_set(data->hb, positions[i], 1);
+}
+
+for (i = 0; i < data->size; i += buf_size) {
+hbitmap_serialize_part(data->hb, buf, i, buf_size);
+for (j = 0; j < buf_size; j++) {
+bool should_set = false;
+for (k = 0; k < num_positions; k++) {
+if (positions[k] == j + i) {
+should_set = true;
+break;
+}
+}
+g_assert_cmpint(should_set, ==, test_bit(j, (unsigned long *)buf));
+}
+}
+
+g_free(buf);
+}
+
+static void test_hbitmap_serialize_zeroes(TestHBitmapData *data,
+  const void *unused)
+{
+int i;
+HBitmapIter iter;
+int64_t next;
+uint64_t positions[] = { 0, L1, L2, L3 - L1};
+int num_positions = sizeof(positions) / sizeof(positions[0]);
+
+hbitmap_test_init(data, L3, 0);
+
+for (i = 0; i < num_positions; i++) {
+hbitmap_set(data->hb, positions[i], L1);
+}
+
+for (i = 0; i < num_positions; i++) {
+hbitmap_deserialize_zeroes(data->hb, positions[i], L1, true);
+hbitmap_iter_init(, data->hb, 0);
+next = hbitmap_iter_next();
+if (i == num_positions - 1) {
+g_assert_cmpint(next, ==, -1);
+} else {
+g_assert_cmpint(next, ==, positions[i + 1]);
+}
+}
+}
+
 static void hbitmap_test_add(const char *testpath,
void 

[Qemu-block] [PATCH 01/13] backup: Use Bitmap to replace "s->bitmap"

2016-01-04 Thread Fam Zheng
"s->bitmap" tracks done sectors, we only check bit states without using any
iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
more memory efficient.

Meanwhile, rename it to done_bitmap, to reflect the intention.

Signed-off-by: Fam Zheng 
Reviewed-by: John Snow 
---
 block/backup.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 705bb77..56ddec0 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -22,6 +22,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "qemu/bitmap.h"
 
 #define BACKUP_CLUSTER_BITS 16
 #define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
@@ -47,7 +48,7 @@ typedef struct BackupBlockJob {
 BlockdevOnError on_target_error;
 CoRwlock flush_rwlock;
 uint64_t sectors_read;
-HBitmap *bitmap;
+unsigned long *done_bitmap;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
 
@@ -113,7 +114,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
 cow_request_begin(_request, job, start, end);
 
 for (; start < end; start++) {
-if (hbitmap_get(job->bitmap, start)) {
+if (test_bit(start, job->done_bitmap)) {
 trace_backup_do_cow_skip(job, start);
 continue; /* already copied */
 }
@@ -164,7 +165,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
 goto out;
 }
 
-hbitmap_set(job->bitmap, start, 1);
+set_bit(start, job->done_bitmap);
 
 /* Publish progress, guest I/O counts as progress too.  Note that the
  * offset field is an opaque progress value, it is not a disk offset.
@@ -394,7 +395,7 @@ static void coroutine_fn backup_run(void *opaque)
 start = 0;
 end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
 
-job->bitmap = hbitmap_alloc(end, 0);
+job->done_bitmap = bitmap_new(end);
 
 bdrv_set_enable_write_cache(target, true);
 if (target->blk) {
@@ -475,7 +476,7 @@ static void coroutine_fn backup_run(void *opaque)
 /* wait until pending backup_do_cow() calls have completed */
 qemu_co_rwlock_wrlock(>flush_rwlock);
 qemu_co_rwlock_unlock(>flush_rwlock);
-hbitmap_free(job->bitmap);
+g_free(job->done_bitmap);
 
 if (target->blk) {
 blk_iostatus_disable(target->blk);
-- 
2.4.3




[Qemu-block] [PATCH 00/13] Dirty bitmap changes for migration/persistence work

2016-01-04 Thread Fam Zheng
Two major features are added to block dirty bitmap (and underlying HBitmap) in
this series: meta bitmap and serialization, together with all other supportive
patches.

Both operations are common in dirty bitmap migration and persistence: they need
to find whether and which part of the dirty bitmap in question has changed with
meta dirty bitmap, and they need to write it to the target with serialization.

Fam Zheng (11):
  backup: Use Bitmap to replace "s->bitmap"
  typedefs: Add BdrvDirtyBitmap and HBitmapIter
  block: Move block dirty bitmap code to separate files
  block: Remove unused typedef of BlockDriverDirtyHandler
  block: Hide HBitmap in block dirty bitmap interface
  HBitmap: Introduce "meta" bitmap to track bit changes
  tests: Add test code for meta bitmap
  block: Support meta dirty bitmap
  block: Add two dirty bitmap getters
  block: Assert that bdrv_release_dirty_bitmap succeeded
  tests: Add test code for hbitmap serialization

Vladimir Sementsov-Ogievskiy (2):
  hbitmap: serialization
  block: BdrvDirtyBitmap serialization interface

 block.c  | 339 -
 block/Makefile.objs  |   2 +-
 block/backup.c   |  25 ++-
 block/dirty-bitmap.c | 491 +++
 block/mirror.c   |  14 +-
 include/block/block.h|  39 +---
 include/block/dirty-bitmap.h |  70 ++
 include/qemu/hbitmap.h   |  84 
 include/qemu/typedefs.h  |   3 +
 tests/test-hbitmap.c | 252 ++
 util/hbitmap.c   | 197 +++--
 11 files changed, 1106 insertions(+), 410 deletions(-)
 create mode 100644 block/dirty-bitmap.c
 create mode 100644 include/block/dirty-bitmap.h

-- 
2.4.3




[Qemu-block] [PATCH 02/13] typedefs: Add BdrvDirtyBitmap and HBitmapIter

2016-01-04 Thread Fam Zheng
Following patches to refactor and move block dirty bitmap code could use this.

Signed-off-by: Fam Zheng 
---
 include/qemu/typedefs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 78fe6e8..e83934e 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -10,6 +10,7 @@ typedef struct AddressSpace AddressSpace;
 typedef struct AioContext AioContext;
 typedef struct AllwinnerAHCIState AllwinnerAHCIState;
 typedef struct AudioState AudioState;
+typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 typedef struct BlockBackend BlockBackend;
 typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
@@ -28,6 +29,7 @@ typedef struct EventNotifier EventNotifier;
 typedef struct FWCfgIoState FWCfgIoState;
 typedef struct FWCfgMemState FWCfgMemState;
 typedef struct FWCfgState FWCfgState;
+typedef struct HBitmapIter HBitmapIter;
 typedef struct HCIInfo HCIInfo;
 typedef struct I2CBus I2CBus;
 typedef struct I2SCodec I2SCodec;
-- 
2.4.3




[Qemu-block] [PATCH 06/13] HBitmap: Introduce "meta" bitmap to track bit changes

2016-01-04 Thread Fam Zheng
Upon each bit toggle, the corresponding bit in the meta bitmap will be
set.

Signed-off-by: Fam Zheng 
---
 include/qemu/hbitmap.h |  8 +++
 util/hbitmap.c | 61 +-
 2 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index bb94a00..ed672e7 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -181,6 +181,14 @@ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap 
*hb, uint64_t first);
  */
 unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
 
+/* hbitmap_create_meta
+ * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
+ *
+ * @hb: The HBitmap to operate on.
+ * @chunk_size: How many bits in @hb does one bit in the meta track.
+ */
+HBitmap *hbitmap_create_meta(HBitmap *hb, int chunk_size);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 50b888f..55d3182 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -81,6 +81,9 @@ struct HBitmap {
  */
 int granularity;
 
+/* A meta dirty bitmap to track the dirtiness of bits in this HBitmap. */
+HBitmap *meta;
+
 /* A number of progressively less coarse bitmaps (i.e. level 0 is the
  * coarsest).  Each bit in level N represents a word in level N+1 that
  * has a set bit, except the last level where each bit represents the
@@ -212,25 +215,27 @@ static uint64_t hb_count_between(HBitmap *hb, uint64_t 
start, uint64_t last)
 }
 
 /* Setting starts at the last layer and propagates up if an element
- * changes from zero to non-zero.
+ * changes.
  */
 static inline bool hb_set_elem(unsigned long *elem, uint64_t start, uint64_t 
last)
 {
 unsigned long mask;
-bool changed;
+unsigned long old;
 
 assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
 assert(start <= last);
 
 mask = 2UL << (last & (BITS_PER_LONG - 1));
 mask -= 1UL << (start & (BITS_PER_LONG - 1));
-changed = (*elem == 0);
+old = *elem;
 *elem |= mask;
-return changed;
+return old != *elem;
 }
 
-/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
-static void hb_set_between(HBitmap *hb, int level, uint64_t start, uint64_t 
last)
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)...
+ * Returns true if at least one bit is changed. */
+static bool hb_set_between(HBitmap *hb, int level, uint64_t start,
+   uint64_t last)
 {
 size_t pos = start >> BITS_PER_LEVEL;
 size_t lastpos = last >> BITS_PER_LEVEL;
@@ -259,22 +264,27 @@ static void hb_set_between(HBitmap *hb, int level, 
uint64_t start, uint64_t last
 if (level > 0 && changed) {
 hb_set_between(hb, level - 1, pos, lastpos);
 }
+return changed;
 }
 
 void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count)
 {
 /* Compute range in the last layer.  */
+uint64_t first, n;
 uint64_t last = start + count - 1;
 
 trace_hbitmap_set(hb, start, count,
   start >> hb->granularity, last >> hb->granularity);
 
-start >>= hb->granularity;
+first = start >> hb->granularity;
 last >>= hb->granularity;
-count = last - start + 1;
+n = last - first + 1;
 
-hb->count += count - hb_count_between(hb, start, last);
-hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
+hb->count += n - hb_count_between(hb, first, last);
+if (hb_set_between(hb, HBITMAP_LEVELS - 1, first, last) &&
+hb->meta) {
+hbitmap_set(hb->meta, start, count);
+}
 }
 
 /* Resetting works the other way round: propagate up if the new
@@ -295,8 +305,10 @@ static inline bool hb_reset_elem(unsigned long *elem, 
uint64_t start, uint64_t l
 return blanked;
 }
 
-/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
-static void hb_reset_between(HBitmap *hb, int level, uint64_t start, uint64_t 
last)
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)...
+ * Returns true if at least one bit is changed. */
+static bool hb_reset_between(HBitmap *hb, int level, uint64_t start,
+ uint64_t last)
 {
 size_t pos = start >> BITS_PER_LEVEL;
 size_t lastpos = last >> BITS_PER_LEVEL;
@@ -339,21 +351,28 @@ static void hb_reset_between(HBitmap *hb, int level, 
uint64_t start, uint64_t la
 if (level > 0 && changed) {
 hb_reset_between(hb, level - 1, pos, lastpos);
 }
+
+return changed;
+
 }
 
 void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count)
 {
 /* Compute range in the last layer.  */
+uint64_t first;
 uint64_t last = start + count - 1;
 
 trace_hbitmap_reset(hb, start, count,
 start >> hb->granularity, last >> hb->granularity);
 
-start >>= hb->granularity;
+first = start >> hb->granularity;
 last >>= hb->granularity;
 
-hb->count -= 

[Qemu-block] [PATCH 09/13] block: Add two dirty bitmap getters

2016-01-04 Thread Fam Zheng
For dirty bitmap users to get the size and the name of a
BdrvDirtyBitmap.

Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 10 ++
 include/block/dirty-bitmap.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4314659..9cac794 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -153,6 +153,16 @@ void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
 hbitmap_reset(bitmap->meta, sector, nb_sectors);
 }
 
+int64_t bdrv_dirty_bitmap_size(BdrvDirtyBitmap *bitmap)
+{
+return bitmap->size;
+}
+
+const char *bdrv_dirty_bitmap_name(BdrvDirtyBitmap *bitmap)
+{
+return bitmap->name;
+}
+
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
 {
 return bitmap->successor;
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 0715220..e36efb6 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -31,6 +31,8 @@ uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState 
*bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
+const char *bdrv_dirty_bitmap_name(BdrvDirtyBitmap *bitmap);
+int64_t bdrv_dirty_bitmap_size(BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t 
sector);
 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-- 
2.4.3




[Qemu-block] [PATCH 08/13] block: Support meta dirty bitmap

2016-01-04 Thread Fam Zheng
The added group of operations enables tracking of the changed bits in
the dirty bitmap.

Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 51 
 include/block/dirty-bitmap.h |  9 
 2 files changed, 60 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 53cf88d..4314659 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -37,6 +37,7 @@
  */
 struct BdrvDirtyBitmap {
 HBitmap *bitmap;/* Dirty sector bitmap implementation */
+HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
 int64_t size;   /* Size of the bitmap (Number of sectors) */
@@ -102,6 +103,56 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 return bitmap;
 }
 
+/* bdrv_create_meta_dirty_bitmap
+ *
+ * Create a meta dirty bitmap that tracks the changes of bits in @bitmap. I.e.
+ * when a dirty status bit in @bitmap is changed (either from reset to set or
+ * the other way around), its respective meta dirty bitmap bit will be marked
+ * dirty as well.
+ *
+ * @bitmap: the block dirty bitmap for which to create a meta dirty bitmap.
+ * @granularity: how many bytes of bitmap data does each bit in the meta bitmap
+ * track.
+ */
+void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+   int granularity)
+{
+assert(!bitmap->meta);
+bitmap->meta = hbitmap_create_meta(bitmap->bitmap,
+   BDRV_SECTOR_SIZE * BITS_PER_BYTE);
+}
+
+void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+assert(bitmap->meta);
+hbitmap_free(bitmap->meta);
+bitmap->meta = NULL;
+}
+
+int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, int64_t sector,
+   int nb_sectors)
+{
+uint64_t i;
+int gran = bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
+
+/* To optimize: we can make hbitmap to internally check the range in a
+ * coarse level, or at least do it word by word. */
+for (i = sector; i < sector + nb_sectors; i += gran) {
+if (hbitmap_get(bitmap->meta, i)) {
+return true;
+}
+}
+return false;
+}
+
+void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, int64_t sector,
+  int nb_sectors)
+{
+hbitmap_reset(bitmap->meta, sector, nb_sectors);
+}
+
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
 {
 return bitmap->successor;
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 16bb15a..0715220 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -8,6 +8,9 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   uint32_t granularity,
   const char *name,
   Error **errp);
+void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+   int granularity);
+void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap,
Error **errp);
@@ -34,6 +37,12 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int nr_sectors);
+int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, int64_t sector,
+   int nb_sectors);
+void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, int64_t sector,
+  int nb_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
-- 
2.4.3




[Qemu-block] [PATCH 07/13] tests: Add test code for meta bitmap

2016-01-04 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/test-hbitmap.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index abcea0c..19136e7 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include "qemu/hbitmap.h"
+#include "block/block.h"
 
 #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
 
@@ -23,6 +24,7 @@
 
 typedef struct TestHBitmapData {
 HBitmap   *hb;
+HBitmap   *meta;
 unsigned long *bits;
 size_t size;
 size_t old_size;
@@ -94,6 +96,14 @@ static void hbitmap_test_init(TestHBitmapData *data,
 }
 }
 
+static void hbitmap_test_init_meta(TestHBitmapData *data,
+   uint64_t size, int granularity,
+   int meta_chunk)
+{
+hbitmap_test_init(data, size, granularity);
+data->meta = hbitmap_create_meta(data->hb, meta_chunk);
+}
+
 static inline size_t hbitmap_test_array_size(size_t bits)
 {
 size_t n = (bits + BITS_PER_LONG - 1) / BITS_PER_LONG;
@@ -637,6 +647,103 @@ static void 
test_hbitmap_truncate_shrink_large(TestHBitmapData *data,
 hbitmap_test_truncate(data, size, -diff, 0);
 }
 
+static void hbitmap_check_meta(TestHBitmapData *data,
+   int64_t start, int count)
+{
+int64_t i;
+
+for (i = 0; i < data->size; i++) {
+if (i >= start && i < start + count) {
+g_assert(hbitmap_get(data->meta, i));
+} else {
+g_assert(!hbitmap_get(data->meta, i));
+}
+}
+}
+
+static void hbitmap_test_meta(TestHBitmapData *data,
+  int64_t start, int count,
+  int64_t check_start, int check_count)
+{
+hbitmap_reset_all(data->hb);
+hbitmap_reset_all(data->meta);
+
+/* Test "unset" -> "unset" will not update meta. */
+hbitmap_reset(data->hb, start, count);
+hbitmap_check_meta(data, 0, 0);
+
+/* Test "unset" -> "set" will update meta */
+hbitmap_set(data->hb, start, count);
+hbitmap_check_meta(data, check_start, check_count);
+
+/* Test "set" -> "set" will not update meta */
+hbitmap_reset_all(data->meta);
+hbitmap_set(data->hb, start, count);
+hbitmap_check_meta(data, 0, 0);
+
+/* Test "set" -> "unset" will update meta */
+hbitmap_reset_all(data->meta);
+hbitmap_reset(data->hb, start, count);
+hbitmap_check_meta(data, check_start, check_count);
+}
+
+static void hbitmap_test_meta_do(TestHBitmapData *data, int chunk_size)
+{
+uint64_t size = chunk_size * 100;
+hbitmap_test_init_meta(data, size, 0, chunk_size);
+
+hbitmap_test_meta(data, 0, 1, 0, chunk_size);
+hbitmap_test_meta(data, 0, chunk_size, 0, chunk_size);
+hbitmap_test_meta(data, chunk_size - 1, 1, 0, chunk_size);
+hbitmap_test_meta(data, chunk_size - 1, 2, 0, chunk_size * 2);
+hbitmap_test_meta(data, chunk_size - 1, chunk_size + 1, 0, chunk_size * 2);
+hbitmap_test_meta(data, chunk_size - 1, chunk_size + 2, 0, chunk_size * 3);
+hbitmap_test_meta(data, 7 * chunk_size - 1, chunk_size + 2,
+  6 * chunk_size, chunk_size * 3);
+hbitmap_test_meta(data, size - 1, 1, size - chunk_size, chunk_size);
+hbitmap_test_meta(data, 0, size, 0, size);
+}
+
+static void test_hbitmap_meta_byte(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BITS_PER_BYTE);
+}
+
+static void test_hbitmap_meta_word(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BITS_PER_LONG);
+}
+
+static void test_hbitmap_meta_sector(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_meta_do(data, BDRV_SECTOR_SIZE * BITS_PER_BYTE);
+}
+
+/**
+ * Create an HBitmap and test set/unset.
+ */
+static void test_hbitmap_meta_one(TestHBitmapData *data, const void *unused)
+{
+int i;
+int64_t offsets[] = {
+0, 1, L1 - 1, L1, L1 + 1, L2 - 1, L2, L2 + 1, L3 - 1, L3, L3 + 1
+};
+
+hbitmap_test_init_meta(data, L3 * 2, 0, 1);
+for (i = 0; i < ARRAY_SIZE(offsets); i++) {
+hbitmap_test_meta(data, offsets[i], 1, offsets[i], 1);
+hbitmap_test_meta(data, offsets[i], L1, offsets[i], L1);
+hbitmap_test_meta(data, offsets[i], L2, offsets[i], L2);
+}
+}
+
+static void test_hbitmap_meta_zero(TestHBitmapData *data, const void *unused)
+{
+hbitmap_test_init_meta(data, 0, 0, 1);
+
+hbitmap_check_meta(data, 0, 0);
+}
+
 static void hbitmap_test_add(const char *testpath,
void (*test_func)(TestHBitmapData *data, 
const void *user_data))
 {
@@ -686,6 +793,12 @@ int main(int argc, char **argv)
  test_hbitmap_truncate_grow_large);
 hbitmap_test_add("/hbitmap/truncate/shrink/large",
  test_hbitmap_truncate_shrink_large);
+
+hbitmap_test_add("/hbitmap/meta/zero", 

[Qemu-block] [PATCH 04/13] block: Remove unused typedef of BlockDriverDirtyHandler

2016-01-04 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 include/block/block.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 97e9b5e..0f42964 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -320,8 +320,6 @@ BlockDriverState *check_to_replace_node(BlockDriverState 
*parent_bs,
 const char *node_name, Error **errp);
 
 /* async block I/O */
-typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector,
- int sector_num);
 BlockAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
-- 
2.4.3




[Qemu-block] [PATCH 10/13] block: Assert that bdrv_release_dirty_bitmap succeeded

2016-01-04 Thread Fam Zheng
This makes sure we don't leak a dirty bitmap in any case.

Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 9cac794..60ee965 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -297,6 +297,7 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
 return;
 }
 }
+abort();
 }
 
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
-- 
2.4.3




Re: [Qemu-block] [PATCH v4 01/14] block: Add "file" output parameter to block status query functions

2016-01-04 Thread Fam Zheng
On Mon, 01/04 22:00, Max Reitz wrote:
> On 24.12.2015 06:50, Fam Zheng wrote:
> > The added parameter can be used to return the BDS pointer which the
> > valid offset is referring to. It's value should be ignored unless
> 
> *Its
> 
> > BDRV_BLOCK_OFFSET_VALID in ret is set.
> > 
> > Until block drivers fill in the right value, let's clear it explicitly
> > right before calling .bdrv_get_block_status.
> > 
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/io.c| 42 --
> >  block/iscsi.c |  6 --
> >  block/mirror.c|  3 ++-
> >  block/parallels.c |  2 +-
> >  block/qcow.c  |  2 +-
> >  block/qcow2.c |  2 +-
> >  block/qed.c   |  3 ++-
> >  block/raw-posix.c |  3 ++-
> >  block/raw_bsd.c   |  3 ++-
> >  block/sheepdog.c  |  2 +-
> >  block/vdi.c   |  2 +-
> >  block/vmdk.c  |  2 +-
> >  block/vpc.c   |  2 +-
> >  block/vvfat.c |  2 +-
> >  include/block/block.h |  6 --
> >  include/block/block_int.h |  3 ++-
> >  qemu-img.c|  7 +--
> >  17 files changed, 59 insertions(+), 33 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index db8e096..70b4984 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> 
> The comment explaining BDRV_BLOCK_OFFSET_VALID should be changed
> accordingly (you could also say "fixed", because apparently it wasn't
> always bs->file; sometimes it was bs itself (in case of raw-posix, iscsi
> and sheepdog)).

Yes, good point! Will fix this, and the typo above.

Fam



Re: [Qemu-block] [PATCH v4 11/14] vmdk: Return extent's file in bdrv_get_block_status

2016-01-04 Thread Fam Zheng
On Mon, 01/04 21:48, Max Reitz wrote:
> On 24.12.2015 06:50, Fam Zheng wrote:
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/vmdk.c | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/vmdk.c b/block/vmdk.c
> > index f5a56fd..b60a5af 100644
> > --- a/block/vmdk.c
> > +++ b/block/vmdk.c
> > @@ -1265,6 +1265,7 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >   0, 0);
> >  qemu_co_mutex_unlock(>lock);
> >  
> > +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> >  switch (ret) {
> >  case VMDK_ERROR:
> >  ret = -EIO;
> > @@ -1276,15 +1277,13 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >  ret = BDRV_BLOCK_ZERO;
> >  break;
> >  case VMDK_OK:
> > -ret = BDRV_BLOCK_DATA;
> > -if (extent->file == bs->file && !extent->compressed) {
> > -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> > -}
> > -
> > +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> > +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> > +& BDRV_BLOCK_OFFSET_MASK;
> > +*file = extent->file->bs;
> 
> What if the extent is compressed?
> 

You're right, the offset shouldn't be set if compressed. Will fix.

Thanks!

Fam



Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description

2016-01-04 Thread Dr. David Alan Gilbert
* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> > +== Failure Handling ==
> > +There are 6 internal errors when block replication is running:
> > +1. I/O error on primary disk
> > +2. Forwarding primary write requests failed
> > +3. Backup failed
> > +4. I/O error on secondary disk
> > +5. I/O error on active disk
> > +6. Making active disk or hidden disk empty failed
> > +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> > +4 and 6, we just report block replication's error to FT/HA manager (which
> > +decides when to do a new checkpoint, when to do failover).
> > +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.
> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.

Yes, I think that's true.

> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>[X][X]
>   Host 1 <-> Host 2 <-> Host 3

Since COLO is just doing a 1+1 redundency, I think it's not expecting to
cope with a double host failure; it's going to take some time (seconds?) to
sync Host 3 back in when you add it after a failover and the aim would
be not to have distrubed the application for that long, so it should
already be running on Host 2 during that resync.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [Qemu-devel] [PATCH 00/10] qcow2: Implement image locking

2016-01-04 Thread Max Reitz
On 24.12.2015 06:41, Denis V. Lunev wrote:
> On 12/24/2015 02:19 AM, Max Reitz wrote:
>> So the benefits of a qcow2 flag are only minor ones. However, I
>> personally believe that automatic unlock on crash is a very minor
>> benefit as well. That should never happen in practice anyway, and a
>> crashing qemu is such a great inconvenience that I as a user wouldn't
>> really mind having to unlock the image afterwards.
> IMHO you are wrong. This is VERY important. The situation would be exactly
> the same after node poweroff, which could happen and really happens in
> the real life from time to time.
> 
> In this cases VMs should start automatically and ASAP if configured this
> way. Any manual interaction here is a REAL pain.

Thanks, that's a good example.

However, I don't know much about management at that layer, so this is
probably where I'm out of the discussion.

(For instance, I don't know which kind of node you are talking about; I
presume it is a physical node, because if it was a virtual node, you'd
just kill the qemu instance in question by sending a QMP quit command.)

>> In fact, libvirt could even do that manually, couldn't it? If qemu
>> crashes, it just invokes qemu-img force-unlock on any qcow2 image which
>> was attached R/W to the VM.
> 
> in the situation above libvirt does not have the information or this
> information could be unreliable.

Well, then s/libvirt/any of the management layers/. As far as I know,
qemu-img commands are still used pretty high up in the stack.

>>> As an alternative, can we introduce .bdrv_flock() in protocol
>>> drivers, with
>>> similar semantics to flock(2) or lockf(3)? That way all formats can
>>> benefit,
>>> and a program crash will automatically drop the lock.
>> Making other formats benefit from addressing this issue is a good point,
>> but it too is a minor point. Formats other than qcow2 and raw are only
>> supported for compatibility anyway, and we don't need this feature for
>> raw.
> I would like to have this covered by flock and this indeed working for
> years with Parallels.
> 
>>
>> I feel like most of the question which approach to take revolves around
>> "But what if qemu crashes?". You (and others) are right in that having
>> to manually unlock the image then is cumbersome, however, I think that:
>> (1) qemu should never crash anyway.
>> (2) If qemu does crash, having to unlock the image is probably the
>>  least of your worries.
>> (3) If you are using libvirt, it should actually be possible for
>>  libvirt to automatically force-unlock images on qemu crash.
>>
>> This is why I don't think that keeping a locked image behind on qemu
>> crash is actually an issue.
>>
>> Max
>>
> pls see above. Node failure and unexpected power loss really matters.

Good points indeed (maybe, I can't actually judge, but I'll trust you).

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] ping Re: [PATCH v12] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-01-04 Thread Max Reitz
On 29.12.2015 01:27, Programmingkid wrote:
> I do realize you are busy Kevin, but I would
> appreciate knowing my patch is in line 
> for review.

Primarily, he's been on holiday since before christmas until next week.

(I'm telling you so you don't wonder why nothing happens.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] ping Re: [PATCH v12] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-01-04 Thread Programmingkid

On Jan 4, 2016, at 11:35 AM, Max Reitz wrote:

> On 29.12.2015 01:27, Programmingkid wrote:
>> I do realize you are busy Kevin, but I would
>> appreciate knowing my patch is in line 
>> for review.
> 
> Primarily, he's been on holiday since before christmas until next week.
> 
> (I'm telling you so you don't wonder why nothing happens.)
> 
> Max
> 

Thank you very much. I guess I have been a little frustrated with
this patch. I have been trying to have it submitted into QEMU
since August 2015!



Re: [Qemu-block] [Patch v12 resend 00/10] Block replication for continuous checkpoints

2016-01-04 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 12/23/2015 06:04 PM, Stefan Hajnoczi wrote:
> > On Thu, Dec 17, 2015 at 02:22:14PM +0800, Wen Congyang wrote:
> >> Stefan:Ping...
> >>
> >> What about this feature? I have worked for it about 1 year, but it is 
> >> still in the
> >> way...
> > 
> > The code still has TODOs.  What is the plan for supporting replication
> > after failover?  This feature seems critical because anyone who wants FT
> > won't be able to use this code unless it supports FT after the first
> > failure.
> 
> We have implemented it based on an old version qemu. To keep the logical
> simple, we don't post them now. We will post them after this feature is merged
> into qemu.

It's probably best to post them, or at least say how you intend to do it,
so people can get an understanding of which way you're going.

However, why is anything new needed to continue replication after failover?
Shouldn't it be possible to build the secondary's disk structure in a way
that it can (by device/disk add/remove) into a structure that looks the same
as a primary, and then we just start a new migration to the new secondary?

(There's a separate problem of getting that to work with the rest of COLO as
well)

Dave

> 
> > 
> > ---
> > 
> > Adding new block layer APIs that are replication-specific is not clean.
> > Only the replication block driver cares about the start/stop/checkpoint
> > interface.
> > 
> > It is cleaner to have a separate API and data structure for block
> > replication.
> > 
> > The replication code should define its own BlockReplicationOps struct
> > and allow objects to register themselves.  Then it's no longer necessary
> > to modify the core block layer to forward start/stop/checkpoint calls.
> > 
> > Something like:
> > 
> > typedef struct BlockReplicationOps BlockReplicationOps;
> > typedef struct BlockReplicationState {
> > const BlockReplicationOps *ops;
> > QLIST_ENTRY(BlockReplicationState) list;
> > } BlockReplicationState;
> > 
> > typedef struct {
> > void start(BlockReplicationState *brs, Error **errp);
> > void stop(BlockReplicationState *brs, Error **errp);
> > void checkpoint(BlockReplicationState *brs, Error **errp);
> > } BlockReplicationOps;
> > 
> > static QLIST_HEAD(BlockReplicationState) block_replication_states;
> > 
> > void block_replication_add(BlockReplicationState *brs);
> > void block_replication_remove(BlockReplicationState *brs);
> > 
> > The replication block driver would add/remove itself.  The quorum block
> > driver probably doesn't need to be modified (I think in your current
> > patches you modify it just to forward the start/stop/checkpoint calls to
> > a particular quorum child).
> 
> Yes, it is the major purpose. We also do some check in the quorum driver: 
> we don't allow more than one child support block replication.
> 
> Thanks
> Wen Congyang
> 
> > 
> > Stefan
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK