[Qemu-block] [PATCH v9 01/16] block: Add "file" output parameter to block status query functions

2016-01-25 Thread Fam Zheng
The added parameter can be used to return the BDS pointer which the
valid offset is referring to. Its value should be ignored unless
BDRV_BLOCK_OFFSET_VALID in ret is set.

Until block drivers fill in the right value, let's clear it explicitly
right before calling .bdrv_get_block_status.

The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest
case 102 passing, and will be fixed once all drivers return the right file
pointer.

Signed-off-by: Fam Zheng 
---
 block/io.c| 40 
 block/iscsi.c |  6 --
 block/mirror.c|  3 ++-
 block/parallels.c |  2 +-
 block/qcow.c  |  2 +-
 block/qcow2.c |  2 +-
 block/qed.c   |  3 ++-
 block/raw-posix.c |  3 ++-
 block/raw_bsd.c   |  3 ++-
 block/sheepdog.c  |  2 +-
 block/vdi.c   |  2 +-
 block/vmdk.c  |  2 +-
 block/vpc.c   |  2 +-
 block/vvfat.c |  2 +-
 include/block/block.h | 11 +++
 include/block/block_int.h |  3 ++-
 qemu-img.c| 13 +
 17 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5bb353a..ea040be 100644
--- a/block/io.c
+++ b/block/io.c
@@ -664,6 +664,7 @@ int bdrv_write_zeroes(BlockDriverState *bs, int64_t 
sector_num,
 int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags)
 {
 int64_t target_sectors, ret, nb_sectors, sector_num = 0;
+BlockDriverState *file;
 int n;
 
 target_sectors = bdrv_nb_sectors(bs);
@@ -676,7 +677,7 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags 
flags)
 if (nb_sectors <= 0) {
 return 0;
 }
-ret = bdrv_get_block_status(bs, sector_num, nb_sectors, );
+ret = bdrv_get_block_status(bs, sector_num, nb_sectors, , );
 if (ret < 0) {
 error_report("error getting block status at sector %" PRId64 ": 
%s",
  sector_num, strerror(-ret));
@@ -1466,6 +1467,7 @@ int bdrv_flush_all(void)
 typedef struct BdrvCoGetBlockStatusData {
 BlockDriverState *bs;
 BlockDriverState *base;
+BlockDriverState **file;
 int64_t sector_num;
 int nb_sectors;
 int *pnum;
@@ -1487,10 +1489,14 @@ typedef struct BdrvCoGetBlockStatusData {
  *
  * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
  * beyond the end of the disk image it will be clamped.
+ *
+ * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
+ * points to the BDS which the sector range is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
  int64_t sector_num,
- int nb_sectors, int *pnum)
+ int nb_sectors, int *pnum,
+ BlockDriverState **file)
 {
 int64_t total_sectors;
 int64_t n;
@@ -1520,7 +1526,9 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 return ret;
 }
 
-ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
+*file = NULL;
+ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
+file);
 if (ret < 0) {
 *pnum = 0;
 return ret;
@@ -1529,7 +1537,7 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 if (ret & BDRV_BLOCK_RAW) {
 assert(ret & BDRV_BLOCK_OFFSET_VALID);
 return bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
- *pnum, pnum);
+ *pnum, pnum, file);
 }
 
 if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
@@ -1549,10 +1557,11 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 if (bs->file &&
 (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
 (ret & BDRV_BLOCK_OFFSET_VALID)) {
+BlockDriverState *file2;
 int file_pnum;
 
 ret2 = bdrv_co_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
-*pnum, _pnum);
+*pnum, _pnum, );
 if (ret2 >= 0) {
 /* Ignore errors.  This is just providing extra information, it
  * is useful but not necessary.
@@ -1577,14 +1586,15 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status_above(BlockDriverState *bs,
 BlockDriverState *base,
 int64_t sector_num,
 int nb_sectors,
-int *pnum)
+int *pnum,
+BlockDriverState **file)
 {
 BlockDriverState *p;
 int64_t ret = 0;
 
 assert(bs != base);
 for (p = bs; p != base; p = backing_bs(p)) {
-ret = 

Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: Limit supported formats for 118

2016-01-25 Thread Markus Armbruster
Max Reitz  writes:

> Image formats used in test 118 need to support image creation.
>
> Reported-by: Markus Armbruster 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/118 | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
> index 114d0e2..beb69d0 100755
> --- a/tests/qemu-iotests/118
> +++ b/tests/qemu-iotests/118
> @@ -717,4 +717,6 @@ if __name__ == '__main__':
>  # We need floppy and IDE CD-ROM
>  iotests.notrun('not suitable for this machine type: %s' %
> iotests.qemu_default_machine)
> -iotests.main()
> +# Need to support image creation
> +iotests.main(supported_fmts=['vpc', 'parallels', 'qcow', 'vdi', 'qcow2',
> + 'vmdk', 'raw', 'vhdx', 'qed'])

Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [PATCH v2 08/13] block: Support meta dirty bitmap

2016-01-25 Thread Vladimir Sementsov-Ogievskiy

On 26.01.2016 09:25, Fam Zheng wrote:

On Fri, 01/22 15:05, Vladimir Sementsov-Ogievskiy wrote:

In my migration series I need iterators, get granularity, and
something like hbitmap_count  for meta bitmaps. You can add them
here if you want, or I can add them in my series.

Okay, I can add that.

I have one more question on the interface: what dirty bitmaps are going to be
migrated? At this moment there are dirty bitmaps created by block jobs
(drive-mirror), and in-memory dirty bitmaps created for incremental backup;
later there will be persistent dirty bitmaps owned by block drivers (extended
version of qcow2, and in QBM driver I'm working on). Which of them are subject
of migration in your series?

I'm asking because I want to know whether we need to implement multiple meta
bitmaps on one block dirty bitmap.

Fam
Only named bitmaps are migrated. For now, only qmp-created bitmaps are 
named. So, it can be in-memory dirty bitmaps or, in future, persistent 
dirty bitmaps.


Why multiple meta bitmaps? What is your plan about meta, should it be 
used for something other than migration? There was ideas about async 
loading persistent metabitmaps - maybe here? We can disallow processes 
using meta bitmaps to be simultaneous.


--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.




[Qemu-block] [PATCH v9 03/16] qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status

2016-01-25 Thread Fam Zheng
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Fam Zheng 
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index d4ea6b4..8babecd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1349,6 +1349,7 @@ static int64_t coroutine_fn 
qcow2_co_get_block_status(BlockDriverState *bs,
 !s->cipher) {
 index_in_cluster = sector_num & (s->cluster_sectors - 1);
 cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+*file = bs->file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
 }
 if (ret == QCOW2_CLUSTER_ZERO) {
-- 
2.4.3




Re: [Qemu-block] [PATCH v8 11/15] vmdk: Return extent's file in bdrv_get_block_status

2016-01-25 Thread Fam Zheng
On Mon, 01/25 14:28, Kevin Wolf wrote:
> Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> > Signed-off-by: Fam Zheng 
> > Reviewed-by: Max Reitz 
> > ---
> >  block/vmdk.c | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/vmdk.c b/block/vmdk.c
> > index e1d3e27..f8f7fcf 100644
> > --- a/block/vmdk.c
> > +++ b/block/vmdk.c
> > @@ -1274,6 +1274,7 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >   0, 0);
> >  qemu_co_mutex_unlock(>lock);
> >  
> > +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> >  switch (ret) {
> >  case VMDK_ERROR:
> >  ret = -EIO;
> > @@ -1286,14 +1287,15 @@ static int64_t coroutine_fn 
> > vmdk_co_get_block_status(BlockDriverState *bs,
> >  break;
> >  case VMDK_OK:
> >  ret = BDRV_BLOCK_DATA;
> > -if (extent->file == bs->file && !extent->compressed) {
> > -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> > +if (!extent->compressed) {
> > +ret |= BDRV_BLOCK_OFFSET_VALID;
> > +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> > +& BDRV_BLOCK_OFFSET_MASK;
> >  }
> > -
> > +*file = extent->file->bs;
> >  break;
> >  }
> >  
> > -index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
> >  n = extent->cluster_sectors - index_in_cluster;
> >  if (n > nb_sectors) {
> >  n = nb_sectors;
> 
> The commit message made me expect that this does exactly the same as the
> other patches, i.e. fill the file argument and nothing else.
> 
> Instead, this extends the functioniality to work on any kind of extents
> instead of just the the main VMDK file, and it changes the calculation
> of the offset (seems to be a hidden bug fix?)
> 
> This needs a non-empty commit message, and depending on what you're
> going to write there, possibly splitting the patch.

Will split it.

Fam



Re: [Qemu-block] [PATCH v8 15/15] iotests: Add "qemu-img map" test for VMDK extents

2016-01-25 Thread Fam Zheng
On Mon, 01/25 15:11, Max Reitz wrote:
> On 25.01.2016 03:44, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > ---
> >  tests/qemu-iotests/059 | 10 ++
> >  tests/qemu-iotests/059.out | 26 ++
> >  2 files changed, 36 insertions(+)
> > 
> > diff --git a/tests/qemu-iotests/059 b/tests/qemu-iotests/059
> > index 0ded0c3..0332bbb 100755
> > --- a/tests/qemu-iotests/059
> > +++ b/tests/qemu-iotests/059
> > @@ -133,6 +133,16 @@ $QEMU_IO -c "write -P 0xa 900G 512" "$TEST_IMG" | 
> > _filter_qemu_io
> >  $QEMU_IO -c "read -v 900G 1024" "$TEST_IMG" | _filter_qemu_io
> >  
> >  echo
> > +echo "=== Testing qemu-img map on extents ==="
> > +for fmt in monolithicSparse twoGbMaxExtentSparse; do
> > +IMGOPTS="subformat=$fmt" _make_test_img 31G
> > +$QEMU_IO -c "write 65024 1k" "$TEST_IMG" | _filter_qemu_io
> > +$QEMU_IO -c "write 2147483136 1k" "$TEST_IMG" | _filter_qemu_io
> > +$QEMU_IO -c "write 5G 1k" "$TEST_IMG" | _filter_qemu_io
> > +$QEMU_IMG map "$TEST_IMG" | _filter_testdir
> > +done
> > +
> > +echo
> >  echo "=== Testing afl image with a very large capacity ==="
> >  _use_sample_img afl9.vmdk.bz2
> >  _img_info
> > diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
> > index 9d506cb..5e041d7 100644
> > --- a/tests/qemu-iotests/059.out
> > +++ b/tests/qemu-iotests/059.out
> > @@ -2050,6 +2050,7 @@ wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  wrote 512/512 bytes at offset 10240
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +qemu-img: Could not open 
> > '/home/fam/build/last/tests/qemu-iotests/scratch/t.vmdk': VMDK version 3 
> > must be read only
> 
> I'd rather have the test fail than include this apparently wrong line
> (according to
> http://lists.nongnu.org/archive/html/qemu-block/2016-01/msg00710.html)
> here, but I'll leave that up to you.

It snuck in because of copy when updating 059.out, I'll fix.

Fam



Re: [Qemu-block] [PATCH v8 04/15] raw: Assign bs to file in raw_co_get_block_status

2016-01-25 Thread Fam Zheng
On Mon, 01/25 14:17, Kevin Wolf wrote:
> Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> > diff --git a/block/raw_bsd.c b/block/raw_bsd.c
> > index 9a8933b..fd355d5 100644
> > --- a/block/raw_bsd.c
> > +++ b/block/raw_bsd.c
> > @@ -119,6 +119,7 @@ static int64_t coroutine_fn 
> > raw_co_get_block_status(BlockDriverState *bs,
> >  BlockDriverState **file)
> >  {
> >  *pnum = nb_sectors;
> > +*file = bs;
> 
> This should be bs->file->bs.
> 

Yes, will fix. I was confused about whether this is a format or a protocol.

Fam



Re: [Qemu-block] RFC: Operation Blockers in QEMU Block Nodes

2016-01-25 Thread Alberto Garcia
Hi,

I'm late to the party but I wanted to say that I generally like the idea
of expressing the notion of op blockers in terms of the specific things
that they allow and require in each one of the affected nodes.

I don't know if all current operations can be expressed easily with
these semantics, though.

I also noticed this:

On Sat 19 Dec 2015 06:42:00 AM CET, Jeff Cody  wrote:

 [block-streaming from B to C]
>  A <- B <- C <--- virtio-blk
>
>
> 1. Data is read from (B) and written into (C)
>
>  - This is not a "Write Visible Data" operation, because a read
>from (C) will return the same results as it did prior to the
>operation.
>
> 2. The backing file in (C) is modified to reflect (A) as the new
>backing file
>
>  - This is what would require the "Write Metadata".  Not
>because it doesn't require "Write Visible Data", but
>because we change the backing file of (C).
>
> 3. Node (B) is dropped from the internal BDS graph, and (C)
>becomes the direct overlay of (A).
>
>   - This would require the "Graph Reconfiguration"
>
>
> Here is how I would see the full set of Require / Allow flags for each
> node involved in your hypothetical block job:
>
>  Nodes involved: (A), (B), (C).
>
> --
> |   Node A |   Node B |   Node C |
> | Req  : Allow | Req  : Allow | Req  : Allow |
> =|
> Modify Visible  |   0  :  1|   0  :  0|   0  :  1|
> Modify Metadata |   0  :  1|   0  :  1|   1  :  0|
> Graph Reconfig  |   1  :  0|   0  :  1|   1  :  0|
> Read Visible|   0  :  1|   1  :  1|   0  :  1|
> Read Metadata   |   1  :  1|   1  :  1|   1  :  1|

Why does this require "graph reconfig" in (A) but not in (B)? The result
of the operation is that (B) is dropped and (C) now points to (A), but
(A) itself doesn't change.

Berto



Re: [Qemu-block] [PATCH v8 04/15] raw: Assign bs to file in raw_co_get_block_status

2016-01-25 Thread Kevin Wolf
Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> Reviewed-by: Eric Blake 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Fam Zheng 
> ---
>  block/raw-posix.c | 1 +
>  block/raw_bsd.c   | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 3ef9b25..8866121 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1861,6 +1861,7 @@ static int64_t coroutine_fn 
> raw_co_get_block_status(BlockDriverState *bs,
>  *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
>  ret = BDRV_BLOCK_ZERO;
>  }
> +*file = bs;
>  return ret | BDRV_BLOCK_OFFSET_VALID | start;
>  }
>  
> diff --git a/block/raw_bsd.c b/block/raw_bsd.c
> index 9a8933b..fd355d5 100644
> --- a/block/raw_bsd.c
> +++ b/block/raw_bsd.c
> @@ -119,6 +119,7 @@ static int64_t coroutine_fn 
> raw_co_get_block_status(BlockDriverState *bs,
>  BlockDriverState **file)
>  {
>  *pnum = nb_sectors;
> +*file = bs;

This should be bs->file->bs.

>  return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
> (sector_num << BDRV_SECTOR_BITS);
>  }

Kevin



Re: [Qemu-block] [PATCH v8 00/15] qemu-img map: Allow driver to return file of the allocated block

2016-01-25 Thread Kevin Wolf
Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> v8: Fix patch 15. [Max]
> Add Max's rev-by in patch 1.
> 
> v7: Rebase, update patch 1 for two new bdrv_get_block_status_above() callers 
> in
> qemu-img.c. [Max]
> Add Max's rev-by in patch 12.
> 
> Original cover letter
> -
> 
> I stumbled upon this when looking at external bitmap formats.
> 
> Current "qemu-img map" command only displays filename if the data is allocated
> in bs (bs->file) itself, or in the backing chain. Otherwise, it displays an
> unfriendly error message:
> 
> $ qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/test.vmdk 1G
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> qemu-img: File contains external, encrypted or compressed clusters.
> 
> This can be improved. This series extends the .bdrv_co_get_block_status
> callback, to let block driver return the BDS of file; then updates all driver
> to implement it; and lastly, it changes qemu-img to use this information in
> "map" command:
> 
> 
> $ qemu-img map /tmp/test.vmdk
> Offset  Length  Mapped to   File
> 0   0x4000  0   /tmp/test-flat.vmdk
> 
> $ qemu-img map --output json /tmp/test.vmdk
> [{"length": 1073741824, "start": 0, "zero": false, "offset": 0, "depth": 
> 0,
>   "file": "/tmp/test-flat.vmdk", "data": true}
> ]

Commented on patches 1, 4 and 11. The rest looks good to me.

Kevin



Re: [Qemu-block] [PATCH v8 01/15] block: Add "file" output parameter to block status query functions

2016-01-25 Thread Kevin Wolf
Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> The added parameter can be used to return the BDS pointer which the
> valid offset is referring to. Its value should be ignored unless
> BDRV_BLOCK_OFFSET_VALID in ret is set.
> 
> Until block drivers fill in the right value, let's clear it explicitly
> right before calling .bdrv_get_block_status.
> 
> The "bs->file" condition in bdrv_co_get_block_status is kept now to keep 
> iotest
> case 102 passing, and will be fixed once all drivers return the right file
> pointer.
> 
> Signed-off-by: Fam Zheng 
> Reviewed-by: Max Reitz 
> ---
>  block/io.c| 38 ++
>  block/iscsi.c |  6 --
>  block/mirror.c|  3 ++-
>  block/parallels.c |  2 +-
>  block/qcow.c  |  2 +-
>  block/qcow2.c |  2 +-
>  block/qed.c   |  3 ++-
>  block/raw-posix.c |  3 ++-
>  block/raw_bsd.c   |  3 ++-
>  block/sheepdog.c  |  2 +-
>  block/vdi.c   |  2 +-
>  block/vmdk.c  |  2 +-
>  block/vpc.c   |  2 +-
>  block/vvfat.c |  2 +-
>  include/block/block.h | 11 +++
>  include/block/block_int.h |  3 ++-
>  qemu-img.c| 13 +
>  17 files changed, 64 insertions(+), 35 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 5bb353a..0836991 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -664,6 +664,7 @@ int bdrv_write_zeroes(BlockDriverState *bs, int64_t 
> sector_num,
>  int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags)
>  {
>  int64_t target_sectors, ret, nb_sectors, sector_num = 0;
> +BlockDriverState *file;
>  int n;
>  
>  target_sectors = bdrv_nb_sectors(bs);
> @@ -676,7 +677,7 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags 
> flags)
>  if (nb_sectors <= 0) {
>  return 0;
>  }
> -ret = bdrv_get_block_status(bs, sector_num, nb_sectors, );
> +ret = bdrv_get_block_status(bs, sector_num, nb_sectors, , );
>  if (ret < 0) {
>  error_report("error getting block status at sector %" PRId64 ": 
> %s",
>   sector_num, strerror(-ret));
> @@ -1466,6 +1467,7 @@ int bdrv_flush_all(void)
>  typedef struct BdrvCoGetBlockStatusData {
>  BlockDriverState *bs;
>  BlockDriverState *base;
> +BlockDriverState **file;
>  int64_t sector_num;
>  int nb_sectors;
>  int *pnum;
> @@ -1490,7 +1492,8 @@ typedef struct BdrvCoGetBlockStatusData {
>   */
>  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>   int64_t sector_num,
> - int nb_sectors, int 
> *pnum)
> + int nb_sectors, int 
> *pnum,
> + BlockDriverState **file)

This function has a comment explaining all parameters. Except **file
now.

My impression at the moment is that it generally returns bs->file for
format drivers and bs itself for protocol drivers if the sector is
allocated in this layer.

>  {
>  int64_t total_sectors;
>  int64_t n;
> @@ -1520,16 +1523,19 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  return ret;
>  }
>  
> -ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, 
> pnum);
> +*file = NULL;
> +ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
> +file);
>  if (ret < 0) {
>  *pnum = 0;
> +*file = NULL;

You specified that file is to be ignored unless BDRV_BLOCK_OFFSET_VALID
is set. Why reset it here then? It's not reset in other error cases.

>  return ret;
>  }
>  
>  if (ret & BDRV_BLOCK_RAW) {
>  assert(ret & BDRV_BLOCK_OFFSET_VALID);
>  return bdrv_get_block_status(bs->file->bs, ret >> BDRV_SECTOR_BITS,
> - *pnum, pnum);
> + *pnum, pnum, file);
>  }
>  
>  if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {

> diff --git a/block/vvfat.c b/block/vvfat.c
> index 2ea5a4a..b8d29e1 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -2884,7 +2884,7 @@ static coroutine_fn int vvfat_co_write(BlockDriverState 
> *bs, int64_t sector_num,
>  }
>  
>  static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
> - int64_t sector_num, int nb_sectors, int* n)
> + int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
>  {
>  BDRVVVFATState* s = bs->opaque;
>  *n = s->sector_count - sector_num;

This still returns NULL at the end of the series. Shouldn't it return bs
like other protocol drivers do?

Kevin



Re: [Qemu-block] [PATCH v8 11/15] vmdk: Return extent's file in bdrv_get_block_status

2016-01-25 Thread Kevin Wolf
Am 25.01.2016 um 03:44 hat Fam Zheng geschrieben:
> Signed-off-by: Fam Zheng 
> Reviewed-by: Max Reitz 
> ---
>  block/vmdk.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index e1d3e27..f8f7fcf 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1274,6 +1274,7 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>   0, 0);
>  qemu_co_mutex_unlock(>lock);
>  
> +index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  switch (ret) {
>  case VMDK_ERROR:
>  ret = -EIO;
> @@ -1286,14 +1287,15 @@ static int64_t coroutine_fn 
> vmdk_co_get_block_status(BlockDriverState *bs,
>  break;
>  case VMDK_OK:
>  ret = BDRV_BLOCK_DATA;
> -if (extent->file == bs->file && !extent->compressed) {
> -ret |= BDRV_BLOCK_OFFSET_VALID | offset;
> +if (!extent->compressed) {
> +ret |= BDRV_BLOCK_OFFSET_VALID;
> +ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
> +& BDRV_BLOCK_OFFSET_MASK;
>  }
> -
> +*file = extent->file->bs;
>  break;
>  }
>  
> -index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
>  n = extent->cluster_sectors - index_in_cluster;
>  if (n > nb_sectors) {
>  n = nb_sectors;

The commit message made me expect that this does exactly the same as the
other patches, i.e. fill the file argument and nothing else.

Instead, this extends the functioniality to work on any kind of extents
instead of just the the main VMDK file, and it changes the calculation
of the offset (seems to be a hidden bug fix?)

This needs a non-empty commit message, and depending on what you're
going to write there, possibly splitting the patch.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH 8/8] iotests: Add test for a nonexistent NBD export

2016-01-25 Thread Eric Blake
On 01/25/2016 11:41 AM, Max Reitz wrote:
> Trying to connect to a nonexistent NBD export should not crash the
> server.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/143 | 73 
> ++
>  tests/qemu-iotests/143.out |  7 +
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 81 insertions(+)
>  create mode 100755 tests/qemu-iotests/143
>  create mode 100644 tests/qemu-iotests/143.out
> 

Reviewed-by: Eric Blake 


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I've been looking at what's needed to add a new secondary after
> > a primary failed; from the block side it doesn't look as hard
> > as I'd expected, perhaps you can tell me if I'm missing something!
> > 
> > The normal primary setup is:
> > 
> >quorum
> >   Real disk
> >   nbd client
> 
> quorum
>real disk
>replication
>   nbd client
> 
> > 
> > The normal secondary setup is:
> >replication
> >   active-disk
> >   hidden-disk
> >   Real-disk
> 
> IIRC, we can do it like this:
> quorum
>replication
>   active-disk
>   hidden-disk
>   real-disk

Yes.

> > With a couple of minor code hacks; I changed the secondary to be:
> > 
> >quorum
> >   replication
> > active-disk
> > hidden-disk
> > Real-disk
> >   dummy-disk
> 
> after failover,
> quorum
>replicaion(old, mode is secondary)
>  active-disk
>  hidden-disk*
>  real-disk*
>replication(new, mode is primary)
>  nbd-client

Do you need to keep the old secondary-replication?
Does that just pass straight through?

> In the newest version, we active commit active-disk to real-disk.
> So it will be:
> quorum
>replicaion(old, mode is secondary)
>  active-disk(it is real disk now)
>replication(new, mode is primary)
>  nbd-client

How does that active-commit work?  I didn't think you
could change the real disk until you had the full checkpoint,
since you don't know whether the primary or secondaries
changes need to be written?

> > and then after the primary fails, I start a new secondary
> > on another host and then on the old secondary do:
> > 
> >   nbd_server_stop
> >   stop
> >   x_block_change top-quorum -d children.0 # deletes use of real 
> > disk, leaves dummy
> >   drive_del active-disk0
> >   x_block_change top-quorum -a node-real-disk
> >   x_block_change top-quorum -d children.1 # Seems to have deleted 
> > the dummy?!, the disk is now child 0
> >   drive_add buddy 
> > driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> >   x_block_change top-quorum -a nbd-client
> >   c
> >   migrate_set_capability x-colo on
> >   migrate -d -b tcp:ibpair:
> > 
> > and I think that means what was the secondary, has the same disk
> > structure as a normal primary.
> > That's not quite happy yet, and I've not figured out why - but the
> > order/structure of the block devices looks right?
> > 
> > Notes:
> >a) The dummy serves two purposes, 1) it works around the segfault
> >   I reported in the other mail, 2) when I delete the real disk in the
> >   first x_block_change it means the quorum still has 1 disk so doesn't
> >   get upset.
> 
> I don't understand the purpose 2.

quorum wont allow you to delete all it's members ('The number of children 
cannot be lower than the vote threshold 1')
and it's very tricky getting the order correct with add/delete; for example
I tried:

drive_add buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
# gets children.1
x_block_change top-quorum -a nbd-client
# deletes the secondary replication
x_block_change top-quorum -d children.0
drive_del active-disk0
# ends up as children.0 but in the 2nd slot
x_block_change top-quorum -a node-real-disk

info block shows me:
top-quorum (#block615): json:{"children": [
{"driver": "replication", "mode": "primary", "file": {"port": "8889", 
"host": "ibpair", "driver": "nbd", "export": "colo-disk0"}},
{"driver": "raw", "file": {"driver": "file", "filename": 
"/home/localvms/bugzilla.raw"}}
   ],
   "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, 
"vote-threshold": 1} (quorum)
Cache mode:   writeback

that has the replication first and the file second; that's the opposite
from the normal primary startup - does it matter?

I can't add node-real-disk until I drive_del active-disk0 (which
previously used it);  and I can't drive_del until I remove
it from the quorum; but I can't remove that from the quorum first,
because that leaves an empty quorum.

> >b) I had to remove the restriction in quorum_start_replication
> >   on which mode it would run in. 
> 
> IIRC, this check will be removed.
> 
> >c) I'm not really sure everything knows it's in secondary mode yet, and
> >   I'm not convinced whether the replication is doing the right thing.
> >d) The migrate -d -b   eventually fails on the destination, not worked 
> > out why
> >   yet.
> 
> Can you give me the error message?

I need to repeat it to check; it was something like a bad flag from the block 
migration
code; it happened after the block migration hit 100%.

> >e) Adding/deleting children on quorum is hard having to 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/8] nbd: client_close on error in nbd_co_client_start

2016-01-25 Thread Eric Blake
On 01/25/2016 11:41 AM, Max Reitz wrote:
> Use client_close() if an error in nbd_co_client_start() occurs instead
> of manually inlining parts of it. This fixes an assertion error on the
> server side if nbd_negotiate() fails.
> 
> Signed-off-by: Max Reitz 
> ---
>  nbd/server.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 4/8] iotests: Move _filter_nbd into common.filter

2016-01-25 Thread Eric Blake
On 01/25/2016 11:41 AM, Max Reitz wrote:
> _filter_nbd can be useful for other NBD tests, too, therefore it should
> reside in common.filter.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/083   | 12 
>  tests/qemu-iotests/common.filter | 12 
>  2 files changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 5/8] iotests: Make _filter_nbd drop log lines

2016-01-25 Thread Max Reitz
The NBD log lines ("/your/source/dir/nbd.c:function():line: error")
should not be converted to empty lines but removed altogether.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/083.out   | 10 --
 tests/qemu-iotests/common.filter |  2 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/083.out b/tests/qemu-iotests/083.out
index 78cc49a..ef3d1e3 100644
--- a/tests/qemu-iotests/083.out
+++ b/tests/qemu-iotests/083.out
@@ -51,7 +51,6 @@ no file open, try 'help open'
 
 === Check disconnect after neg2 ===
 
-
 read failed: Input/output error
 
 === Check disconnect 8 neg2 ===
@@ -66,42 +65,34 @@ no file open, try 'help open'
 
 === Check disconnect before request ===
 
-
 read failed: Input/output error
 
 === Check disconnect after request ===
 
-
 read failed: Input/output error
 
 === Check disconnect before reply ===
 
-
 read failed: Input/output error
 
 === Check disconnect after reply ===
 
-
 read failed: Input/output error
 
 === Check disconnect 4 reply ===
 
-
 read failed: Input/output error
 
 === Check disconnect 8 reply ===
 
-
 read failed: Input/output error
 
 === Check disconnect before data ===
 
-
 read failed: Input/output error
 
 === Check disconnect after data ===
 
-
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -132,7 +123,6 @@ no file open, try 'help open'
 
 === Check disconnect after neg-classic ===
 
-
 read failed: Input/output error
 
 *** done
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 33ed1e4..e1bfdb7 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -237,7 +237,7 @@ _filter_nbd()
 # receive callbacks sometimes, making them unreliable.
 #
 # Filter out the TCP port number since this changes between runs.
-sed -e 's#^.*nbd/.*\.c:.*##g' \
+sed -e '/nbd\/.*\.c:/d' \
 -e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
 -e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
 }
-- 
2.7.0




[Qemu-block] [PATCH 4/8] iotests: Move _filter_nbd into common.filter

2016-01-25 Thread Max Reitz
_filter_nbd can be useful for other NBD tests, too, therefore it should
reside in common.filter.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/083   | 12 
 tests/qemu-iotests/common.filter | 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
index 36e6de8..aa99278 100755
--- a/tests/qemu-iotests/083
+++ b/tests/qemu-iotests/083
@@ -49,18 +49,6 @@ wait_for_tcp_port() {
done
 }
 
-_filter_nbd()
-{
-# nbd.c error messages contain function names and line numbers that are
-# prone to change.  Message ordering depends on timing between send and
-# receive callbacks sometimes, making them unreliable.
-#
-# Filter out the TCP port number since this changes between runs.
-sed -e 's#^.*nbd/.*\.c:.*##g' \
--e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
--e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
-}
-
 check_disconnect() {
event=$1
when=$2
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index cfdb633..33ed1e4 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -230,5 +230,17 @@ _filter_qemu_img_map()
 -e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt
 }
 
+_filter_nbd()
+{
+# nbd.c error messages contain function names and line numbers that are
+# prone to change.  Message ordering depends on timing between send and
+# receive callbacks sometimes, making them unreliable.
+#
+# Filter out the TCP port number since this changes between runs.
+sed -e 's#^.*nbd/.*\.c:.*##g' \
+-e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
+-e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
+}
+
 # make sure this script returns success
 true
-- 
2.7.0




[Qemu-block] [PATCH 6/8] iotests: Make _filter_nbd support more URL types

2016-01-25 Thread Max Reitz
This function should support URLs of the "nbd://" format (without
swallowing the export name), and for "nbd:///" URLs it should replace
"?socket=$TEST_DIR" by "?socket=TEST_DIR" because putting the Unix
socket files into the test directory makes sense.

Signed-off-by: Max Reitz 
Reviewed-by: John Snow 
Reviewed-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 tests/qemu-iotests/common.filter | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index e1bfdb7..84b7434 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -238,7 +238,8 @@ _filter_nbd()
 #
 # Filter out the TCP port number since this changes between runs.
 sed -e '/nbd\/.*\.c:/d' \
--e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
+-e 's#nbd:\(//\)\?127\.0\.0\.1:[0-9]*#nbd:\1127.0.0.1:PORT#g' \
+-e "s#?socket=$TEST_DIR#?socket=TEST_DIR#g" \
 -e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
 }
 
-- 
2.7.0




[Qemu-block] [PATCH 8/8] iotests: Add test for a nonexistent NBD export

2016-01-25 Thread Max Reitz
Trying to connect to a nonexistent NBD export should not crash the
server.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/143 | 73 ++
 tests/qemu-iotests/143.out |  7 +
 tests/qemu-iotests/group   |  1 +
 3 files changed, 81 insertions(+)
 create mode 100755 tests/qemu-iotests/143
 create mode 100644 tests/qemu-iotests/143.out

diff --git a/tests/qemu-iotests/143 b/tests/qemu-iotests/143
new file mode 100755
index 000..6207368
--- /dev/null
+++ b/tests/qemu-iotests/143
@@ -0,0 +1,73 @@
+#!/bin/bash
+#
+# Test case for connecting to a non-existing NBD export name
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -f "$TEST_DIR/nbd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+
+keep_stderr=y \
+_launch_qemu 2> >(_filter_nbd)
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'qmp_capabilities' }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'nbd-server-start',
+   'arguments': { 'addr': { 'type': 'unix',
+'data': { 'path': '$TEST_DIR/nbd' " \
+'return'
+
+# This should just result in a client error, not in the server crashing
+$QEMU_IO_PROG -f raw -c quit \
+"nbd+unix:///no_such_export?socket=$TEST_DIR/nbd" 2>&1 \
+| _filter_qemu_io | _filter_nbd
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{ 'execute': 'quit' }" \
+'return'
+
+wait=1 _cleanup_qemu
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/143.out b/tests/qemu-iotests/143.out
new file mode 100644
index 000..dad2024
--- /dev/null
+++ b/tests/qemu-iotests/143.out
@@ -0,0 +1,7 @@
+QA output created by 143
+{"return": {}}
+{"return": {}}
+can't open device nbd+unix:///no_such_export?socket=TEST_DIR/nbd: Failed to 
read export length
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN"}
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index d6e9219..ac6a959 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -142,3 +142,4 @@
 138 rw auto quick
 139 rw auto quick
 142 auto
+143 auto quick
-- 
2.7.0




[Qemu-block] [PATCH 0/8] nbd: Fix failed assertion on negotiation error

2016-01-25 Thread Max Reitz
An error during negotiation, e.g. by the client trying to open an export
that does not exist, should not lead to a crash of the server process.

The middle six patches of this series are taken from my series
"block: Rework bdrv_close_all()", so here is a git-backport-diff against
v7 of that series:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/8:[down] 'nbd: client_close on error in nbd_co_client_start'
002/8:[] [--] 'iotests: Rename filter_nbd to _filter_nbd in 083'
003/8:[0004] [FC] 'iotests: Change coding style of _filter_nbd in 083'
004/8:[0004] [FC] 'iotests: Move _filter_nbd into common.filter'
005/8:[0004] [FC] 'iotests: Make _filter_nbd drop log lines'
006/8:[] [-C] 'iotests: Make _filter_nbd support more URL types'
007/8:[] [--] 'iotests: Make redirecting qemu's stderr optional'
008/8:[down] 'iotests: Add test for a nonexistent NBD export'


Max Reitz (8):
  nbd: client_close on error in nbd_co_client_start
  iotests: Rename filter_nbd to _filter_nbd in 083
  iotests: Change coding style of _filter_nbd in 083
  iotests: Move _filter_nbd into common.filter
  iotests: Make _filter_nbd drop log lines
  iotests: Make _filter_nbd support more URL types
  iotests: Make redirecting qemu's stderr optional
  iotests: Add test for a nonexistent NBD export

 nbd/server.c |  3 +-
 tests/qemu-iotests/083   | 13 +--
 tests/qemu-iotests/083.out   | 10 --
 tests/qemu-iotests/143   | 73 
 tests/qemu-iotests/143.out   |  7 
 tests/qemu-iotests/common.filter | 13 +++
 tests/qemu-iotests/common.qemu   | 15 +++--
 tests/qemu-iotests/group |  1 +
 8 files changed, 109 insertions(+), 26 deletions(-)
 create mode 100755 tests/qemu-iotests/143
 create mode 100644 tests/qemu-iotests/143.out

-- 
2.7.0




[Qemu-block] [PATCH 3/8] iotests: Change coding style of _filter_nbd in 083

2016-01-25 Thread Max Reitz
In order to be able to move _filter_nbd to common.filter in the next
patch, its coding style needs to be adapted to that of common.filter.
That means, we have to convert tabs to four spaces, adjust the alignment
of the last line (done with spaces already, assuming one tab equals
eight spaces), fix the line length of the comment, and add a line break
before the opening brace.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/083 | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
index 13495bc..36e6de8 100755
--- a/tests/qemu-iotests/083
+++ b/tests/qemu-iotests/083
@@ -49,15 +49,16 @@ wait_for_tcp_port() {
done
 }
 
-_filter_nbd() {
-   # nbd.c error messages contain function names and line numbers that are 
prone
-   # to change.  Message ordering depends on timing between send and 
receive
-   # callbacks sometimes, making them unreliable.
-   #
-   # Filter out the TCP port number since this changes between runs.
-   sed -e 's#^.*nbd/.*\.c:.*##g' \
-   -e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
--e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
+_filter_nbd()
+{
+# nbd.c error messages contain function names and line numbers that are
+# prone to change.  Message ordering depends on timing between send and
+# receive callbacks sometimes, making them unreliable.
+#
+# Filter out the TCP port number since this changes between runs.
+sed -e 's#^.*nbd/.*\.c:.*##g' \
+-e 's#nbd:127\.0\.0\.1:[^:]*:#nbd:127\.0\.0\.1:PORT:#g' \
+-e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
 }
 
 check_disconnect() {
-- 
2.7.0




[Qemu-block] [PATCH 1/8] nbd: client_close on error in nbd_co_client_start

2016-01-25 Thread Max Reitz
Use client_close() if an error in nbd_co_client_start() occurs instead
of manually inlining parts of it. This fixes an assertion error on the
server side if nbd_negotiate() fails.

Signed-off-by: Max Reitz 
---
 nbd/server.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 2265cb0..5169b59 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1080,8 +1080,7 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
 nbd_export_get(exp);
 }
 if (nbd_negotiate(data)) {
-shutdown(client->sock, 2);
-client->close(client);
+client_close(client);
 goto out;
 }
 qemu_co_mutex_init(>send_lock);
-- 
2.7.0




[Qemu-block] [PATCH 2/8] iotests: Rename filter_nbd to _filter_nbd in 083

2016-01-25 Thread Max Reitz
In the patch after the next, this function is moved to common.filter.
Therefore, its name should be preceded by an underscore to signify its
global availability.

To keep the code motion patch clean, we cannot rename it in the same
patch, so we need to choose some order of renaming vs. motion. It is
better to keep a supposedly global function used by only a single test
in that test than to keep a supposedly local function in a common* file
and use it from a test, so we should rename the function before moving
it.

Signed-off-by: Max Reitz 
Reviewed-by: John Snow 
Reviewed-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 tests/qemu-iotests/083 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
index 566da99..13495bc 100755
--- a/tests/qemu-iotests/083
+++ b/tests/qemu-iotests/083
@@ -49,7 +49,7 @@ wait_for_tcp_port() {
done
 }
 
-filter_nbd() {
+_filter_nbd() {
# nbd.c error messages contain function names and line numbers that are 
prone
# to change.  Message ordering depends on timing between send and 
receive
# callbacks sometimes, making them unreliable.
@@ -84,7 +84,7 @@ EOF
 
$PYTHON nbd-fault-injector.py $extra_args "127.0.0.1:$port" 
"$TEST_DIR/nbd-fault-injector.conf" 2>&1 >/dev/null &
wait_for_tcp_port "127\\.0\\.0\\.1:$port"
-   $QEMU_IO -c "read 0 512" "$nbd_url" 2>&1 | _filter_qemu_io | filter_nbd
+   $QEMU_IO -c "read 0 512" "$nbd_url" 2>&1 | _filter_qemu_io | _filter_nbd
 
echo
 }
-- 
2.7.0




Re: [Qemu-block] [RFC PATCH 4/4] tsan: various fixes for make check

2016-01-25 Thread Paolo Bonzini


On 25/01/2016 17:49, Alex Bennée wrote:
> After building with the ThreadSanitizer I ran make check and started
> going through the failures reported. Most are failures to use atomic
> primitives to access variables previously atomically set. While this
> likely will work on x86 it could cause problems on other architectures.

It will work on other architectures, because there are memory barriers
(either explicit, or implicit in other atomic_* ops).  In fact, using
atomic_read/atomic_set would actually fix bugs in x86 :) if it weren't
for commit 3bbf572 ("atomics: add explicit compiler fence in __atomic
memory barriers", 2015-06-03).

>  - async: use atomic reads for scheduled/notify_me
>  - thread-pool: use atomic_mb_read/set to for thread ->state

Please use atomic_read/atomic_set and keep the memory barriers.  That's
a fine way to convert code that uses non-atomic accesses together with
memory barriers (not just thread-pool, also e.g. virtio).

Otherwise looks sane, thanks!

Paolo

>  - test-thread-pool: use atomic read for data.n
> 
> Signed-off-by: Alex Bennée 
> ---
>  async.c  | 4 ++--
>  tests/test-thread-pool.c | 2 +-
>  thread-pool.c| 9 -
>  3 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/async.c b/async.c
> index e106072..8d5f810 100644
> --- a/async.c
> +++ b/async.c
> @@ -165,7 +165,7 @@ aio_compute_timeout(AioContext *ctx)
>  QEMUBH *bh;
>  
>  for (bh = ctx->first_bh; bh; bh = bh->next) {
> -if (!bh->deleted && bh->scheduled) {
> +if (!bh->deleted && atomic_read(>scheduled)) {
>  if (bh->idle) {
>  /* idle bottom halves will be polled at least
>   * every 10ms */
> @@ -286,7 +286,7 @@ void aio_notify(AioContext *ctx)
>   * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>   */
>  smp_mb();
> -if (ctx->notify_me) {
> +if (atomic_read(>notify_me)) {
>  event_notifier_set(>notifier);
>  atomic_mb_set(>notified, true);
>  }
> diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
> index ccdee39..5694ad9 100644
> --- a/tests/test-thread-pool.c
> +++ b/tests/test-thread-pool.c
> @@ -46,7 +46,7 @@ static void test_submit(void)
>  {
>  WorkerTestData data = { .n = 0 };
>  thread_pool_submit(pool, worker_cb, );
> -while (data.n == 0) {
> +while (atomic_read() == 0) {
>  aio_poll(ctx, true);
>  }
>  g_assert_cmpint(data.n, ==, 1);
> diff --git a/thread-pool.c b/thread-pool.c
> index 402c778..97b2c0c 100644
> --- a/thread-pool.c
> +++ b/thread-pool.c
> @@ -99,15 +99,14 @@ static void *worker_thread(void *opaque)
>  
>  req = QTAILQ_FIRST(>request_list);
>  QTAILQ_REMOVE(>request_list, req, reqs);
> -req->state = THREAD_ACTIVE;
> +atomic_mb_set(>state, THREAD_ACTIVE);
>  qemu_mutex_unlock(>lock);
>  
>  ret = req->func(req->arg);
>  
>  req->ret = ret;
>  /* Write ret before state.  */
> -smp_wmb();
> -req->state = THREAD_DONE;
> +atomic_mb_set(>state, THREAD_DONE);
>  
>  qemu_mutex_lock(>lock);
>  
> @@ -167,7 +166,7 @@ static void thread_pool_completion_bh(void *opaque)
>  
>  restart:
>  QLIST_FOREACH_SAFE(elem, >head, all, next) {
> -if (elem->state != THREAD_DONE) {
> +if (atomic_read(>state) != THREAD_DONE) {
>  continue;
>  }
>  
> @@ -201,7 +200,7 @@ static void thread_pool_cancel(BlockAIOCB *acb)
>  trace_thread_pool_cancel(elem, elem->common.opaque);
>  
>  qemu_mutex_lock(>lock);
> -if (elem->state == THREAD_QUEUED &&
> +if (atomic_mb_read(>state) == THREAD_QUEUED &&
>  /* No thread has yet started working on elem. we can try to "steal"
>   * the item from the worker if we can get a signal from the
>   * semaphore.  Because this is non-blocking, we can do it with
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 1/8] nbd: client_close on error in nbd_co_client_start

2016-01-25 Thread Daniel P. Berrange
On Mon, Jan 25, 2016 at 07:41:08PM +0100, Max Reitz wrote:
> Use client_close() if an error in nbd_co_client_start() occurs instead
> of manually inlining parts of it. This fixes an assertion error on the
> server side if nbd_negotiate() fails.
> 
> Signed-off-by: Max Reitz 
> ---
>  nbd/server.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 2265cb0..5169b59 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1080,8 +1080,7 @@ static coroutine_fn void nbd_co_client_start(void 
> *opaque)
>  nbd_export_get(exp);
>  }
>  if (nbd_negotiate(data)) {
> -shutdown(client->sock, 2);
> -client->close(client);
> +client_close(client);
>  goto out;
>  }
>  qemu_co_mutex_init(>send_lock);

The same as my fix last week :-)

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03409.html

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [Qemu-devel] [Questions] Several questions about incremental backup

2016-01-25 Thread John Snow


On 01/25/2016 02:35 AM, Rudy Zhang wrote:
> I am reading and testing the function: incremental backup in qemu-2.5.
> But I have serveral questions about it.
> 1. If I want to start image backup, at first I need to start full mode backup
>and then, add a bitmap to trace io, next start incremental backup via the
>bitmap before we added. But when the first incremental backup over, it will
>abdicate the bitmap. How can I start the second incremental backup without
>the bitmap to trace io? I don't know why abdicate the bitmap.
>Is it only an incremental backup?

Check out https://github.com/qemu/qemu/blob/master/docs/bitmaps.md for
some examples of workflow, hopefully this is useful.

When you create a new bitmap object, it's useful to sync it to a full
backup of some kind so that it's useful, you can do this several ways.
Either QMP commands while the VM is paused, or QMP transactions when the
VM is running. See /docs/bitmaps.md for more information.

When you issue a backup command using an incremental bitmap object, QEMU
actually creates a new bitmap to replace the one you are using right away.

Before a backup: [bitmap0]
During a backup: [bitmap0 <-- "successor"]

The bitmap you create is given an anonymous child bitmap (via the
successor pointer) that records new writes while the backup is in
progress. Two things can happen at this point:

(1) The backup succeeds

The "abdicate" function is run and "bitmap0" is deleted and the
anonymous child becomes the new "bitmap0."

(2) The backup fails

It's not safe to delete bitmap0, so QEMU merges the anonymous child back
into bitmap0.


This means that after the backup you'll always have "bitmap0" attached
to the same drive.

It shouldn't be necessary to manually create new incremental bitmap objects.

> 2. When abdicating the bitmap, it seems leak memory about bitmap->successor.
> 

I guess this function looks very suspicious, but what's happening in
actuality is the successor inherits the name of the parent (e.g.
"bitmap0") and then the parent is freed/deleted.

This "promotes" the successor to the new standalone bitmap object,
because both the parent and the successor are part of the list of
bitmaps attached to the drive object -- we did not lose our reference to
the successor.



Re: [Qemu-block] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Li Zhijian



On 01/26/2016 04:20 AM, Dr. David Alan Gilbert wrote:

* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:



On 01/25/2016 09:32 AM, Wen Congyang wrote:

f) I've not thought about the colo-proxy that much yet - I guess that
   existing connections need to keep their sequence number offset but


Strictly speaking, after failover, we only need to keep servicing for the tcp 
connections which are
established after the last checkpoint but not all existing connections. Because 
after a checkpoint
(primary and secondary node works well), primary vm and secondary vm is same, 
that means the existing
tcp connection has the same sequence。


   new connections made by what is now the primary dont need to do anything
   special.

Yes, you are right.


I wonder whether we need to do something special to the new-secondary;
consider this:

1 primary (P1) & secondary (S1) run together
2 New connection opened
3secondary records an offset
4 
5 primary (P1) fails; do failover to secondary
6 secondary (S1) still rewrites sequence for connection opened at (2)
7 Start new-secondary (S2), send checkpoint from S1->S2
8 S2 has same guest contents as S1; so the
  sequence numbers are still offset compared to the outside world.

So S2 needs to be sent the offsets for existing connections, otherwise
is S1 was then to fail, S2 would send the wrong output on the existing
connection?


Thanks for the example.
Sure, if we support continuous FT, colo proxy need to implement migration_save 
and migration_load.
At the beginning of (7), we need to save colo_proxy info(including connection 
info and sequence offset) at S1
and load colo_proxy at S2. S1/S2 need to keep doing tcp re-writer for the 
connections opened at (2)
until they are closed.

Thanks
Li Zhijian



Dave





Hailiang or Zhijian can answer this question.



Thanks
Li Zhijian



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Best regards.
Li Zhijian (8555)





Re: [Qemu-block] COLO: how to flip a secondary to a primary?

2016-01-25 Thread Wen Congyang
On 01/26/2016 02:59 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>> On 01/23/2016 03:35 AM, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   I've been looking at what's needed to add a new secondary after
>>> a primary failed; from the block side it doesn't look as hard
>>> as I'd expected, perhaps you can tell me if I'm missing something!
>>>
>>> The normal primary setup is:
>>>
>>>quorum
>>>   Real disk
>>>   nbd client
>>
>> quorum
>>real disk
>>replication
>>   nbd client
>>
>>>
>>> The normal secondary setup is:
>>>replication
>>>   active-disk
>>>   hidden-disk
>>>   Real-disk
>>
>> IIRC, we can do it like this:
>> quorum
>>replication
>>   active-disk
>>   hidden-disk
>>   real-disk
> 
> Yes.
> 
>>> With a couple of minor code hacks; I changed the secondary to be:
>>>
>>>quorum
>>>   replication
>>> active-disk
>>> hidden-disk
>>> Real-disk
>>>   dummy-disk
>>
>> after failover,
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk
>>  hidden-disk*
>>  real-disk*
>>replication(new, mode is primary)
>>  nbd-client
> 
> Do you need to keep the old secondary-replication?
> Does that just pass straight through?

Yes, the old secondary-replication can work in the newest mode.
For example, we don't start colo again after failover, we do nothing.

> 
>> In the newest version, we active commit active-disk to real-disk.
>> So it will be:
>> quorum
>>replicaion(old, mode is secondary)
>>  active-disk(it is real disk now)
>>replication(new, mode is primary)
>>  nbd-client
> 
> How does that active-commit work?  I didn't think you
> could change the real disk until you had the full checkpoint,
> since you don't know whether the primary or secondaries
> changes need to be written?

I start the active-commit work when doing failover. After failover,
the primary changes after last checkpoint should be dropped(How to cancel
the inprogress write ops?).

> 
>>> and then after the primary fails, I start a new secondary
>>> on another host and then on the old secondary do:
>>>
>>>   nbd_server_stop
>>>   stop
>>>   x_block_change top-quorum -d children.0 # deletes use of real 
>>> disk, leaves dummy
>>>   drive_del active-disk0
>>>   x_block_change top-quorum -a node-real-disk
>>>   x_block_change top-quorum -d children.1 # Seems to have deleted 
>>> the dummy?!, the disk is now child 0
>>>   drive_add buddy 
>>> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
>>>   x_block_change top-quorum -a nbd-client
>>>   c
>>>   migrate_set_capability x-colo on
>>>   migrate -d -b tcp:ibpair:
>>>
>>> and I think that means what was the secondary, has the same disk
>>> structure as a normal primary.
>>> That's not quite happy yet, and I've not figured out why - but the
>>> order/structure of the block devices looks right?
>>>
>>> Notes:
>>>a) The dummy serves two purposes, 1) it works around the segfault
>>>   I reported in the other mail, 2) when I delete the real disk in the
>>>   first x_block_change it means the quorum still has 1 disk so doesn't
>>>   get upset.
>>
>> I don't understand the purpose 2.
> 
> quorum wont allow you to delete all it's members ('The number of children 
> cannot be lower than the vote threshold 1')
> and it's very tricky getting the order correct with add/delete; for example
> I tried:
> 
> drive_add buddy 
> driver=replication,mode=primary,file.driver=nbd,file.host=ibpair,file.port=8889,file.export=colo-disk0,node-name=nbd-client,if=none,cache=none
> # gets children.1
> x_block_change top-quorum -a nbd-client
> # deletes the secondary replication
> x_block_change top-quorum -d children.0
> drive_del active-disk0

The active-disk0 contains some data, and you should not delete it.
If we do active-commit after failover, the active-disk0 is the real disk.

> # ends up as children.0 but in the 2nd slot
> x_block_change top-quorum -a node-real-disk
> 
> info block shows me:
> top-quorum (#block615): json:{"children": [
> {"driver": "replication", "mode": "primary", "file": {"port": "8889", 
> "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}},
> {"driver": "raw", "file": {"driver": "file", "filename": 
> "/home/localvms/bugzilla.raw"}}
>],
>"driver": "quorum", "blkverify": false, "rewrite-corrupted": false, 
> "vote-threshold": 1} (quorum)
> Cache mode:   writeback
> 
> that has the replication first and the file second; that's the opposite
> from the normal primary startup - does it matter?

it is OK. But reading from children.0 always fails and will read data from 
children.1

> 
> I can't add node-real-disk until I drive_del active-disk0 (which
> previously used it);  and I can't drive_del until I remove
> it from the quorum; but I can't remove that from the quorum 

Re: [Qemu-block] [PATCH v3 2/4] blockdev: Fix 'change' for slot devices

2016-01-25 Thread Alberto Garcia
On Fri 22 Jan 2016 11:50:48 PM CET, Max Reitz  wrote:
> 'change' and related operations did not work when used on guest devices
> featuring removable media but no actual tray, because
> blk_dev_is_tray_open() always returned false for them and the
> blockdev-{insert,remove}-medium commands required it to return true.
>
> Fix this by making blockdev-{insert,remove}-medium work on tray-less
> devices. Also, blockdev-{open,close}-tray are now explicitly no-ops when
> invoked on such devices, and blk_dev_change_media_cb() is instead
> called by blockdev-{insert,remove}-medium (for tray-less devices only).
>
> Reported-by: Peter Maydell 
> Cc: qemu-stable 
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH] vmdk: Fix converting to streamOptimized

2016-01-25 Thread Kevin Wolf
Am 25.01.2016 um 03:26 hat Fam Zheng geschrieben:
> Commit d62d9dc4b8 lifted streamOptimized images's version to 3, but we
> now refuse to open version 3 images read-write.  We need to make
> streamOptimized an exception to allow converting to it. This fixes the
> accidentally broken iotests case 059 for the same reason.
> 
> Signed-off-by: Fam Zheng 

How different are version 3 images for other subformats? Are we
arbitrarily restrictring their use or is it really that they don't work
with our driver? And if they don't work with our driver, are we sure
that streamOptimized images can't use the features we don't support?

Or is the version defined per subformat and doesn't necessarily exist
for other types?

Kevin



Re: [Qemu-block] [PATCH v3 0/4] blockdev: Fix 'change' for slot devices

2016-01-25 Thread Kevin Wolf
Am 22.01.2016 um 23:50 hat Max Reitz geschrieben:
> The series "BlockBackend and media" intended all block devices with
> removable media to implement a tray model; if the devices does not have
> a tray, it should emulate one.
> 
> While this may make sense from a technical perspective (blockdev-*-tray
> are guest device controlling operations, invoking
> blk_dev_change_media_cb(); blockdev-*-medium are operations concerning
> the block layer, controlling the BB-BDS link), it is (probably)
> unintuitive to users, and it requires said implementation of an emulated
> tray for each of the slot devices (floppy disk drives and SD card
> readers).
> 
> We can get rid of those virtual trays by special-casing tray-less
> devices in blockdev-*-tray (those operations are no-ops there) and in
> blockdev-*-medium (those operations then have to invoke
> blk_dev_change_media_cb()). With this change, changing the medium
> inserted into a slot device will no longer emit TRAY_MOVED events (which
> seems like a bugfix to me, because slot devices actually do not have
> trays).

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-25 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I can trigger a segfault if I wire in the block replication together with
> > a quorum instance; it only triggers with both of them present but,
> > it looks like the problem is a disagreement about the number of quorum
> > members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> > that is posted in the colo-framework set that I think includes this set
> > (from https://github.com/coloft/qemu.git).
> > 
> > To trigger:
> > ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> > 
> > (qemu) drive_add 0 
> > if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> > (qemu) drive_add 1 
> > if=none,id=active-disk0,throttling.bps-total=7000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> > (qemu) drive_add 2 
> > if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> > (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> > 
> > *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': 
> > free(): invalid pointer: 0x55a8fdf0 ***
> > === Backtrace: =
> > /lib64/libc.so.6(+0x7cfe1)[0x7110ffe1]
> > /lib64/libglib-2.0.so.0(g_free+0xf)[0x71ecc36f]
> > /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> > Program received signal SIGABRT, Aborted.
> > 0x710c85f7 in raise () from /lib64/libc.so.6
> > (gdb) where
> > #0  0x710c85f7 in raise () from /lib64/libc.so.6
> > #1  0x710c9ce8 in abort () from /lib64/libc.so.6
> > #2  0x71108317 in __libc_message () from /lib64/libc.so.6
> > #3  0x7110ffe1 in _int_free () from /lib64/libc.so.6
> > #4  0x71ecc36f in g_free () from /lib64/libglib-2.0.so.0
> > #5  0x559dfdd7 in qemu_iovec_destroy (qiov=0x57815410) at 
> > /root/colo/jan-2016/qemu/util/iov.c:378
> > #6  0x55989cce in quorum_aio_finalize (acb=0x57815350) at 
> > /root/colo/jan-2016/qemu/block/quorum.c:171
> > 171 qemu_iovec_destroy(>qcrs[i].qiov);
> > (gdb) list
> > 166 
> > 167 if (acb->is_read) {
> > 168 /* on the quorum case acb->child_iter == s->num_children - 1 */
> > 169 for (i = 0; i <= acb->child_iter; i++) {
> > 170 qemu_vfree(acb->qcrs[i].buf);
> > 171 qemu_iovec_destroy(>qcrs[i].qiov);
> > 172 }
> > 173 }
> > 174 
> > 175 g_free(acb->qcrs);
> > (gdb) p acb->child_iter
> > $1 = 1
> > (gdb) p i
> > $3 = 1
> > 
> > #7  0x5598afca in quorum_aio_cb (opaque=, ret=-5)
> > at /root/colo/jan-2016/qemu/block/quorum.c:302
> > #8  0x559990ee in bdrv_co_complete (acb=0x57815410) at 
> > /root/colo/jan-2016/qemu/block/io.c:2122
> > .
> > 
> > So I guess acb->child_iter is wrong, since we only have one child on that 
> > quorum?
> > and we're trying to do a destroy on the second child.
> 
> Can you try the following patch:
> From 3f2c5ec288cd9a36afb392b4bba24029f3e9345a Mon Sep 17 00:00:00 2001
> From: Wen Congyang 
> Date: Mon, 25 Jan 2016 09:18:09 +0800
> Subject: [PATCH] quorum: fix segfault when read fails in fifo mode
> 
> Signed-off-by: Wen Congyang 
> ---
>  block/quorum.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index a5ae4b8..0965277 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -295,6 +295,9 @@ static void quorum_aio_cb(void *opaque, int ret)
>  quorum_copy_qiov(acb->qiov, >qcrs[acb->child_iter].qiov);
>  }
>  acb->vote_ret = ret;
> +if (ret < 0) {
> +acb->child_iter--;
> +}
>  quorum_aio_finalize(acb);
>  return;
>  }

Yes, that seems to fix it; thanks.

Dave

> -- 
> 2.5.0
> 
> 
> 
> > 
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-01-25 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I can trigger a segfault if I wire in the block replication together with
> > a quorum instance; it only triggers with both of them present but,
> > it looks like the problem is a disagreement about the number of quorum
> > members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> > that is posted in the colo-framework set that I think includes this set
> > (from https://github.com/coloft/qemu.git).
> > 
> > To trigger:
> > ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> > 
> > (qemu) drive_add 0 
> > if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> > (qemu) drive_add 1 
> > if=none,id=active-disk0,throttling.bps-total=7000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> > (qemu) drive_add 2 
> > if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> > (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> > 
> > *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': 
> > free(): invalid pointer: 0x55a8fdf0 ***
> > === Backtrace: =
> > /lib64/libc.so.6(+0x7cfe1)[0x7110ffe1]
> > /lib64/libglib-2.0.so.0(g_free+0xf)[0x71ecc36f]
> > /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> > Program received signal SIGABRT, Aborted.
> > 0x710c85f7 in raise () from /lib64/libc.so.6
> > (gdb) where
> > #0  0x710c85f7 in raise () from /lib64/libc.so.6
> > #1  0x710c9ce8 in abort () from /lib64/libc.so.6
> > #2  0x71108317 in __libc_message () from /lib64/libc.so.6
> > #3  0x7110ffe1 in _int_free () from /lib64/libc.so.6
> > #4  0x71ecc36f in g_free () from /lib64/libglib-2.0.so.0
> > #5  0x559dfdd7 in qemu_iovec_destroy (qiov=0x57815410) at 
> > /root/colo/jan-2016/qemu/util/iov.c:378
> > #6  0x55989cce in quorum_aio_finalize (acb=0x57815350) at 
> > /root/colo/jan-2016/qemu/block/quorum.c:171
> > 171 qemu_iovec_destroy(>qcrs[i].qiov);
> > (gdb) list
> > 166 
> > 167 if (acb->is_read) {
> > 168 /* on the quorum case acb->child_iter == s->num_children - 1 */
> > 169 for (i = 0; i <= acb->child_iter; i++) {
> > 170 qemu_vfree(acb->qcrs[i].buf);
> > 171 qemu_iovec_destroy(>qcrs[i].qiov);
> > 172 }
> > 173 }
> > 174 
> > 175 g_free(acb->qcrs);
> > (gdb) p acb->child_iter
> > $1 = 1
> > (gdb) p i
> > $3 = 1
> 
> Thanks for your test. Can you give me the following information:
> 1. acb->ret's value

(gdb) p acb->ret
There is no member named ret.
(gdb) p acb->vote_ret
$2 = -5

> 2. s->num_children

(gdb) p ((BDRVQuorumState *)acb->common.bs->opaque)->num_children
$5 = 1

Dave

> 
> I think it is quorum's bug, and acb->ret is < 0.
> 
> Thanks
> Wen Congyang
> 
> > 
> > #7  0x5598afca in quorum_aio_cb (opaque=, ret=-5)
> > at /root/colo/jan-2016/qemu/block/quorum.c:302
> > #8  0x559990ee in bdrv_co_complete (acb=0x57815410) at 
> > /root/colo/jan-2016/qemu/block/io.c:2122
> > .
> > 
> > So I guess acb->child_iter is wrong, since we only have one child on that 
> > quorum?
> > and we're trying to do a destroy on the second child.
> > 
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH] vmdk: Fix converting to streamOptimized

2016-01-25 Thread Fam Zheng
On Mon, 01/25 12:16, Kevin Wolf wrote:
> Am 25.01.2016 um 03:26 hat Fam Zheng geschrieben:
> > Commit d62d9dc4b8 lifted streamOptimized images's version to 3, but we
> > now refuse to open version 3 images read-write.  We need to make
> > streamOptimized an exception to allow converting to it. This fixes the
> > accidentally broken iotests case 059 for the same reason.
> > 
> > Signed-off-by: Fam Zheng 
> 
> How different are version 3 images for other subformats? Are we
> arbitrarily restrictring their use or is it really that they don't work
> with our driver? And if they don't work with our driver, are we sure
> that streamOptimized images can't use the features we don't support?
> 
> Or is the version defined per subformat and doesn't necessarily exist
> for other types?

Version 3 images are undocumented except in the VMware KB article mentioned in
the comment around this line (http://kb.vmware.com/kb/2064959). A few years
ago, when users complained that QEMU doesn't support version 3 images, we
presumed from the article that reading is okay, as the new feature is
"persistent changed block tracking" (although it didn't say it is the only
feature enabled by version 3), and went ahead enabling it.

This time, it seems newer VMware products only accept version 3 if the
subformat is streamOptimized.  Again, without any documentation/specification
update. Then our users complains again, so we add another exception to
mitigate. As this subformat doesn't allow overwrite, the only use case is
qemu-img converting to it.  So this is pretty safe - it's always operating a
new image - and the approach is tested by multiple users (both upstream and
downstream).

Fam