Re: [Qemu-block] [PATCH v6 2/4] qcow2: add option to clean unused cache entries after some time

2015-07-08 Thread Kevin Wolf
Am 05.06.2015 um 18:27 hat Alberto Garcia geschrieben:
 This adds a new 'cache-clean-interval' option that cleans all qcow2
 cache entries that haven't been used in a certain interval, given in
 seconds.
 
 This allows setting a large L2 cache size so it can handle scenarios
 with lots of I/O and at the same time use little memory during periods
 of inactivity.
 
 This feature currently relies on MADV_DONTNEED to free that memory, so
 it is not useful in systems that don't follow that behavior.
 
 Signed-off-by: Alberto Garcia be...@igalia.com
 Reviewed-by: Max Reitz mre...@redhat.com

This patch doesn't apply cleanly any more. Can you please rebase and
change the QAPI documenation to say since 2.5?

Kevin



Re: [Qemu-block] [PATCH 1/2] block: vpc - prevent overflow if max_table_entries = 0x40000000

2015-07-08 Thread Kevin Wolf
Am 26.06.2015 um 11:57 hat Stefan Hajnoczi geschrieben:
 On Thu, Jun 25, 2015 at 11:05:20AM -0400, Jeff Cody wrote:
  On Thu, Jun 25, 2015 at 03:28:35PM +0100, Stefan Hajnoczi wrote:
   On Wed, Jun 24, 2015 at 03:54:27PM -0400, Jeff Cody wrote:
@@ -269,7 +270,9 @@ static int vpc_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-s-pagetable = qemu_try_blockalign(bs-file, 
s-max_table_entries * 4);
+pagetable_size = (size_t) s-max_table_entries * 4;
+
+s-pagetable = qemu_try_blockalign(bs-file, pagetable_size);
   
   On 32-bit hosts size_t is 32-bit so the overflow hasn't been solved.
   
   Does it make sense to impose a limit on pagetable_size?
  
  Good point.  Yes, it does.
  
  The VHD spec says that the Max Table Entries field should be equal
  to the disk size / block size.  I don't know if there are images out
  there that treat that as = disk size / block size rather than ==,
  however.  But if we assume max size of 2TB for a VHD disk, and a
  minimal block size of 512 bytes, that would give us a
  max_table_entries of 0x1, which exceeds 32 bits by itself.
  
  For pagetable_size to fit in a 32-bit, that means to support 2TB on a
  32-bit host in the current implementation, the minimum block size is
  4096.
  
  We could check during open / create that 
  (disk_size / block_size) * 4  SIZE_MAX, and refuse to open if this is
  not true (and also validate max_table_entries to fit in the same).
 
 Sounds good.

Jeff, I couldn't find a v2 anywhere. Are you still planning to send it?

Kevin


pgpwMzEoRfQNy.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2] block/curl: Don't lose original error when a connection fails.

2015-07-08 Thread Richard W.M. Jones
On Wed, Jul 08, 2015 at 02:01:30PM +0200, Kevin Wolf wrote:
 The guest can't cause it, but once the connection is down, I expect
 every request to fail. You don't have to have a malicious guest for
 filling up the log file, it just needs to be careless enough to continue
 trying new requests instead of offlining the device.

How about something along these lines?

[I'm not clear if atomic is necessary here, nor if there is already
some mechanism for suppressing log messages - a cursory grep of the
qemu source didn't find anything.]

diff --git a/block/curl.c b/block/curl.c
index 2fd7c06..33c14d8 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -299,11 +299,20 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 /* ACBs for successful messages get completed in curl_read_cb */
 if (msg-data.result != CURLE_OK) {
 int i;
+static int errcount = 100;
+int current_errcount;
 
 /* Don't lose the original error message from curl, since
  * it contains extra data.
  */
-error_report(curl: %s, state-errmsg);
+current_errcount = atomic_fetch_add(errcount, 0);
+if (current_errcount  0) {
+error_report(curl: %s, state-errmsg);
+if (current_errcount == 1) {
+error_report(curl: further errors suppressed);
+}
+atomic_dec(errcount);
+}
 
 for (i = 0; i  CURL_NUM_ACB; i++) {
 CURLAIOCB *acb = state-acb[i];

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



Re: [Qemu-block] [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll

2015-07-08 Thread Stefan Hajnoczi
On Wed, Jul 08, 2015 at 09:01:27AM +0800, Fam Zheng wrote:
 On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
   +#define EPOLL_BATCH 128
   +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
   +{
   +AioHandler *node;
   +bool was_dispatching;
   +int i, ret;
   +bool progress;
   +int64_t timeout;
   +struct epoll_event events[EPOLL_BATCH];
   +
   +aio_context_acquire(ctx);
   +was_dispatching = ctx-dispatching;
   +progress = false;
   +
   +/* aio_notify can avoid the expensive event_notifier_set if
   + * everything (file descriptors, bottom halves, timers) will
   + * be re-evaluated before the next blocking poll().  This is
   + * already true when aio_poll is called with blocking == false;
   + * if blocking == true, it is only true after poll() returns.
   + *
   + * If we're in a nested event loop, ctx-dispatching might be true.
   + * In that case we can restore it just before returning, but we
   + * have to clear it now.
   + */
   +aio_set_dispatching(ctx, !blocking);
   +
   +ctx-walking_handlers++;
   +
   +timeout = blocking ? aio_compute_timeout(ctx) : 0;
   +
   +if (timeout  0) {
   +timeout = DIV_ROUND_UP(timeout, 100);
   +}
  
  I think you already posted the timerfd code in an earlier series.  Why
  degrade to millisecond precision?  It needs to be fixed up anyway if the
  main loop uses aio_poll() in the future.
 
 Because of a little complication: timeout here is always -1 for iothread, and
 what is interesting is that -1 actually requires an explicit
 
 timerfd_settime(timerfd, flags, (struct itimerspec){{0, 0}}, NULL)
 
 to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
 epoll_wait() without this doesn't work because the timerfd is already added to
 the epollfd and may have an unexpected timeout set before.
 
 Of course we can cache the state and optimize, but I've not reasoned about 
 what
 if another thread happens to call aio_poll() when we're in epoll_wait(), for
 example when the first aio_poll() has a positive timeout but the second one 
 has
 -1.

I'm not sure I understand the threads scenario since aio_poll_epoll()
has a big aio_context_acquire()/release() region that protects it, but I
guess the nested aio_poll() case is similar.  Care needs to be taken so
the extra timerfd state is consistent.

The optimization can be added later unless the timerfd_settime() syscall
is so expensive that it defeats the advantage of epoll().


pgpGBlTnInsLd.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH] raw-posix.c: remove raw device access for cdrom

2015-07-08 Thread Kevin Wolf
Am 08.07.2015 um 12:47 hat Laurent Vivier geschrieben:
 
 
 On 08/07/2015 12:31, Kevin Wolf wrote:
  Am 02.07.2015 um 16:18 hat Laurent Vivier geschrieben:
 
 
  On 02/07/2015 16:03, Paolo Bonzini wrote:
 
 
  On 02/07/2015 15:58, Laurent Vivier wrote:
  Since any /dev entry can be treated as a raw disk image, it is worth
  noting which devices can be accessed when and how. /dev/rdisk nodes are
  character-special devices, but are raw in the BSD sense and force
  block-aligned I/O. They are closer to the physical disk than the buffer
  cache. /dev/disk nodes, on the other hand, are buffered block-special
  devices and are used primarily by the kernel's filesystem code.
 
  So the right thing to do would not be just to set need_alignment, but to
  probe it like we do on Linux for BDRV_O_NO_CACHE.
 
  I'm okay with doing the simple thing, but it needs a comment for 
  non-BSDers.
 
  So, what we have to do, in our case, for MacOS X cdrom, is something like:
 
  ... GetBSDPath ...
  ...
  if (flags  BDRV_O_NOCACHE) {
  strcat(bsdPath, r);
  }
  ...
  
  I would avoid such magic. What we could do is rejecting /dev/rdisk nodes
  without BDRV_O_NOCACHE.
 
 It's not how it works...
 
 Look in hdev_open().
 
 If user provides /dev/cdrom on the command line, in the case of MacOS X,
 QEMU searches for a cdrom drive in the system and set filename to
 /dev/rdiskX according to the result.

Oh, we're already playing such games... I guess you're right then.

It even seems to be not only for '/dev/cdrom', but for everything
starting with this string. Does anyone know what's the reason for that?

Also, I guess before doing strcat() on bsdPath, we should check the
buffer length...

 Perhaps this part should be removed.
 
 But if we just want to correct the bug, we must not set filename to
 /dev/rdiskX if NOCACHE is not set but to /dev/diskX
 
 It's the aim of this change.

Yes, that looks right.

Kevin



Re: [Qemu-block] [PATCH RFC 3/4] aio: Introduce aio_context_setup

2015-07-08 Thread Stefan Hajnoczi
On Wed, Jul 08, 2015 at 09:15:57AM +0800, Fam Zheng wrote:
 On Tue, 07/07 15:35, Stefan Hajnoczi wrote:
  On Tue, Jun 30, 2015 at 09:19:44PM +0800, Fam Zheng wrote:
   diff --git a/async.c b/async.c
   index 06971f4..1d70cfd 100644
   --- a/async.c
   +++ b/async.c
   @@ -290,12 +290,17 @@ AioContext *aio_context_new(Error **errp)
{
int ret;
AioContext *ctx;
   +Error *local_err = NULL;
   +
ctx = (AioContext *) g_source_new(aio_source_funcs, 
   sizeof(AioContext));
   +aio_context_setup(ctx, local_err);
   +if (local_err) {
   +error_propagate(errp, local_err);
  
  Is there any reason to introduce local_err?  errp can be passed directly
  into aio_context_setup().
 
 It's used for catching failure of aio_context_setup, because the convention is
 errp can be NULL.

You are right, I missed that aio_context_setup() has a void return type.


pgpRqk_4uUwRV.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH] raw-posix.c: remove raw device access for cdrom

2015-07-08 Thread Kevin Wolf
Am 02.07.2015 um 16:18 hat Laurent Vivier geschrieben:
 
 
 On 02/07/2015 16:03, Paolo Bonzini wrote:
  
  
  On 02/07/2015 15:58, Laurent Vivier wrote:
  Since any /dev entry can be treated as a raw disk image, it is worth
  noting which devices can be accessed when and how. /dev/rdisk nodes are
  character-special devices, but are raw in the BSD sense and force
  block-aligned I/O. They are closer to the physical disk than the buffer
  cache. /dev/disk nodes, on the other hand, are buffered block-special
  devices and are used primarily by the kernel's filesystem code.
  
  So the right thing to do would not be just to set need_alignment, but to
  probe it like we do on Linux for BDRV_O_NO_CACHE.
  
  I'm okay with doing the simple thing, but it needs a comment for non-BSDers.
 
 So, what we have to do, in our case, for MacOS X cdrom, is something like:
 
 ... GetBSDPath ...
 ...
 if (flags  BDRV_O_NOCACHE) {
 strcat(bsdPath, r);
 }
 ...

I would avoid such magic. What we could do is rejecting /dev/rdisk nodes
without BDRV_O_NOCACHE.

Kevin



[Qemu-block] [PULL v2 02/16] Revert dataplane: allow virtio-1 devices

2015-07-08 Thread Michael S. Tsirkin
From: Cornelia Huck cornelia.h...@de.ibm.com

This reverts commit f5a5628cf0b65b223fa0c9031714578dfac4cf04.

This was an old patch that had been already superseded by b0e5d90eb
(dataplane: endianness-aware accesses).

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Acked-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/virtio/dataplane/vring.c | 47 -
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index bed9b11..07fd69c 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -158,18 +158,15 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 }
 
 
-static int get_desc(VirtIODevice *vdev, Vring *vring, VirtQueueElement *elem,
+static int get_desc(Vring *vring, VirtQueueElement *elem,
 struct vring_desc *desc)
 {
 unsigned *num;
 struct iovec *iov;
 hwaddr *addr;
 MemoryRegion *mr;
-int is_write = virtio_tswap16(vdev, desc-flags)  VRING_DESC_F_WRITE;
-uint32_t len = virtio_tswap32(vdev, desc-len);
-uint64_t desc_addr = virtio_tswap64(vdev, desc-addr);
 
-if (is_write) {
+if (desc-flags  VRING_DESC_F_WRITE) {
 num = elem-in_num;
 iov = elem-in_sg[*num];
 addr = elem-in_addr[*num];
@@ -193,17 +190,18 @@ static int get_desc(VirtIODevice *vdev, Vring *vring, 
VirtQueueElement *elem,
 }
 
 /* TODO handle non-contiguous memory across region boundaries */
-iov-iov_base = vring_map(mr, desc_addr, len, is_write);
+iov-iov_base = vring_map(mr, desc-addr, desc-len,
+  desc-flags  VRING_DESC_F_WRITE);
 if (!iov-iov_base) {
 error_report(Failed to map descriptor addr %# PRIx64  len %u,
- (uint64_t)desc_addr, len);
+ (uint64_t)desc-addr, desc-len);
 return -EFAULT;
 }
 
 /* The MemoryRegion is looked up again and unref'ed later, leave the
  * ref in place.  */
-iov-iov_len = len;
-*addr = desc_addr;
+iov-iov_len = desc-len;
+*addr = desc-addr;
 *num += 1;
 return 0;
 }
@@ -225,23 +223,21 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
 struct vring_desc desc;
 unsigned int i = 0, count, found = 0;
 int ret;
-uint32_t len = virtio_tswap32(vdev, indirect-len);
-uint64_t addr = virtio_tswap64(vdev, indirect-addr);
 
 /* Sanity check */
-if (unlikely(len % sizeof(desc))) {
+if (unlikely(indirect-len % sizeof(desc))) {
 error_report(Invalid length in indirect descriptor: 
  len %#x not multiple of %#zx,
- len, sizeof(desc));
+ indirect-len, sizeof(desc));
 vring-broken = true;
 return -EFAULT;
 }
 
-count = len / sizeof(desc);
+count = indirect-len / sizeof(desc);
 /* Buffers are chained via a 16 bit next field, so
  * we can have at most 2^16 of these. */
 if (unlikely(count  USHRT_MAX + 1)) {
-error_report(Indirect buffer length too big: %d, len);
+error_report(Indirect buffer length too big: %d, indirect-len);
 vring-broken = true;
 return -EFAULT;
 }
@@ -252,12 +248,12 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
 
 /* Translate indirect descriptor */
 desc_ptr = vring_map(mr,
- addr + found * sizeof(desc),
+ indirect-addr + found * sizeof(desc),
  sizeof(desc), false);
 if (!desc_ptr) {
 error_report(Failed to map indirect descriptor 
  addr %# PRIx64  len %zu,
- (uint64_t)addr + found * sizeof(desc),
+ (uint64_t)indirect-addr + found * sizeof(desc),
  sizeof(desc));
 vring-broken = true;
 return -EFAULT;
@@ -275,20 +271,19 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
 return -EFAULT;
 }
 
-if (unlikely(virtio_tswap16(vdev, desc.flags)
-  VRING_DESC_F_INDIRECT)) {
+if (unlikely(desc.flags  VRING_DESC_F_INDIRECT)) {
 error_report(Nested indirect descriptor);
 vring-broken = true;
 return -EFAULT;
 }
 
-ret = get_desc(vdev, vring, elem, desc);
+ret = get_desc(vring, elem, desc);
 if (ret  0) {
 vring-broken |= (ret == -EFAULT);
 return ret;
 }
-i = virtio_tswap16(vdev, desc.next);
-} while (virtio_tswap16(vdev, desc.flags)  VRING_DESC_F_NEXT);
+i = desc.next;
+} while (desc.flags  VRING_DESC_F_NEXT);
 return 0;
 }
 
@@ -389,7 +384,7 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
 /* Ensure descriptor is loaded 

Re: [Qemu-block] NVMe volatile write cache fixes

2015-07-08 Thread Kevin Wolf
Am 11.06.2015 um 12:01 hat Christoph Hellwig geschrieben:
 The first patch ensures that flush actually flushes data so that
 data integrity is preserved, the second fixe up WCE reporting.

Thanks, applied to the block branch for 2.4-rc1.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH RFC 0/4] aio: Use epoll_wait in aio_poll

2015-07-08 Thread Christian Borntraeger
Am 08.07.2015 um 03:02 schrieb Fam Zheng:
 On Tue, 07/07 16:54, Christian Borntraeger wrote:
 Am 30.06.2015 um 15:19 schrieb Fam Zheng:
 epoll is more scalable than ppoll. It performs faster than ppoll when the
 number of polled fds is high.

 See patch 4 for an example of the senario and some benchmark data.

 Note: it is only effective on iothread (dataplane), while the main loop 
 cannot
 benefit from this yet, because the iohandler and chardev GSource's don't 
 easily
 fit into this epoll interface style (that's why main loop uses qemu_poll_ns
 directly instead of aio_poll()).

 There is hardly any timer activity in iothreads for now, as a result the
 timeout is always 0 or -1. Therefore, timerfd, or the said nanosecond
 epoll_pwait1 interface, which fixes the timeout granularity deficiency is 
 not
 immediately necessary at this point, but still that will be simple to add.

 Please review!

 Is there a branch somewhere, so that I could give it a spin?

 
 Here:
 
 https://github.com/famz/qemu/tree/aio-posix-epoll
 
In file included from /home/cborntra/REPOS/qemu/include/qemu/option.h:31:0,
 from /home/cborntra/REPOS/qemu/include/qemu-common.h:44,
 from /home/cborntra/REPOS/qemu/async.c:25:
/home/cborntra/REPOS/qemu/async.c: In function 'aio_context_new':
/home/cborntra/REPOS/qemu/include/qapi/error.h:57:20: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 error_set_errno(errp, os_error, ERROR_CLASS_GENERIC_ERROR, \
^
/home/cborntra/REPOS/qemu/async.c:291:9: note: 'ret' was declared here
 int ret;
 ^
cc1: all warnings being treated as errors

With that fixed, it seems to work. Still looking at the performance.




[Qemu-block] [PATCH v4] block/curl: Don't lose original error when a connection fails.

2015-07-08 Thread Richard W.M. Jones
Currently if qemu is connected to a curl source (eg. web server), and
the web server fails / times out / dies, you always see a bogus EIO
Input/output error.

For example, choose a large file located on any local webserver which
you control:

  $ qemu-img convert -p http://example.com/large.iso /tmp/test

Once it starts copying the file, stop the webserver and you will see
qemu-img fail with:

  qemu-img: error while reading sector 61440: Input/output error

This patch does two things: Firstly print the actual error from curl
so it doesn't get lost.  Secondly, change EIO to EPROTO.  EPROTO is a
POSIX.1 compatible errno which more accurately reflects that there was
a protocol error, rather than some kind of hardware failure.

After this patch is applied, the error changes to:

  $ qemu-img convert -p http://example.com/large.iso /tmp/test
  qemu-img: curl: transfer closed with 469989 bytes remaining to read
  qemu-img: error while reading sector 16384: Protocol error

Signed-off-by: Richard W.M. Jones rjo...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/curl.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 3a2b63e..032cc8a 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include qemu-common.h
+#include qemu/error-report.h
 #include block/block_int.h
 #include qapi/qmp/qbool.h
 #include qapi/qmp/qstring.h
@@ -298,6 +299,18 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 /* ACBs for successful messages get completed in curl_read_cb */
 if (msg-data.result != CURLE_OK) {
 int i;
+static int errcount = 100;
+
+/* Don't lose the original error message from curl, since
+ * it contains extra data.
+ */
+if (errcount  0) {
+error_report(curl: %s, state-errmsg);
+if (--errcount == 0) {
+error_report(curl: further errors suppressed);
+}
+}
+
 for (i = 0; i  CURL_NUM_ACB; i++) {
 CURLAIOCB *acb = state-acb[i];
 
@@ -305,7 +318,7 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 continue;
 }
 
-acb-common.cb(acb-common.opaque, -EIO);
+acb-common.cb(acb-common.opaque, -EPROTO);
 qemu_aio_unref(acb);
 state-acb[i] = NULL;
 }
-- 
2.3.1




[Qemu-block] [PATCH v3 2/2] block/curl: Don't lose original error when a connection fails.

2015-07-08 Thread Richard W.M. Jones
Currently if qemu is connected to a curl source (eg. web server), and
the web server fails / times out / dies, you always see a bogus EIO
Input/output error.

For example, choose a large file located on any local webserver which
you control:

  $ qemu-img convert -p http://example.com/large.iso /tmp/test

Once it starts copying the file, stop the webserver and you will see
qemu-img fail with:

  qemu-img: error while reading sector 61440: Input/output error

This patch does two things: Firstly print the actual error from curl
so it doesn't get lost.  Secondly, change EIO to EPROTO.  EPROTO is a
POSIX.1 compatible errno which more accurately reflects that there was
a protocol error, rather than some kind of hardware failure.

After this patch is applied, the error changes to:

  $ qemu-img convert -p http://example.com/large.iso /tmp/test
  qemu-img: curl: transfer closed with 469989 bytes remaining to read
  qemu-img: error while reading sector 16384: Protocol error

Signed-off-by: Richard W.M. Jones rjo...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/curl.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 3a2b63e..b72d505 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -22,6 +22,8 @@
  * THE SOFTWARE.
  */
 #include qemu-common.h
+#include qemu/error-report.h
+#include qemu/no-flood.h
 #include block/block_int.h
 #include qapi/qmp/qbool.h
 #include qapi/qmp/qstring.h
@@ -298,6 +300,15 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 /* ACBs for successful messages get completed in curl_read_cb */
 if (msg-data.result != CURLE_OK) {
 int i;
+FLOOD_COUNTER(errcount, 100);
+
+/* Don't lose the original error message from curl, since
+ * it contains extra data.
+ */
+NO_FLOOD(errcount,
+ error_report(curl: %s, state-errmsg),
+ error_report(curl: further errors suppressed));
+
 for (i = 0; i  CURL_NUM_ACB; i++) {
 CURLAIOCB *acb = state-acb[i];
 
@@ -305,7 +316,7 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 continue;
 }
 
-acb-common.cb(acb-common.opaque, -EIO);
+acb-common.cb(acb-common.opaque, -EPROTO);
 qemu_aio_unref(acb);
 state-acb[i] = NULL;
 }
-- 
2.3.1




[Qemu-block] [PATCH v3 1/2] Add a simple mechanism to protect against error message floods.

2015-07-08 Thread Richard W.M. Jones
You can now write code like this:

  FLOOD_COUNTER(errcount, 100);
  ...
  NO_FLOOD(errcount,
   error_report(oops, something bad happened),
   error_report(further errors suppressed));

which would print the oops, ... error message up to 100 times,
followed by the further errors suppressed message once, and then
nothing else.

Signed-off-by: Richard W.M. Jones rjo...@redhat.com
---
 include/qemu/no-flood.h | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 include/qemu/no-flood.h

diff --git a/include/qemu/no-flood.h b/include/qemu/no-flood.h
new file mode 100644
index 000..90a24da
--- /dev/null
+++ b/include/qemu/no-flood.h
@@ -0,0 +1,34 @@
+/*
+ * Suppress error floods.
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ *
+ * Author: Richard W.M. Jones rjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __QEMU_NO_FLOOD_H
+#define __QEMU_NO_FLOOD_H 1
+
+#include qemu/atomic.h
+
+/* Suggested initial value is 100 */
+#define FLOOD_COUNTER(counter_name, initial) \
+  static int counter_name = (initial)
+
+#define NO_FLOOD(counter_name, code, suppress_message) \
+  do { \
+int t_##counter_name = atomic_fetch_add(counter_name, 0); \
+if (t_##counter_name  0) {\
+code;  \
+if (t_##counter_name == 1) {   \
+suppress_message;  \
+}  \
+atomic_dec(counter_name); \
+}  \
+} while (0)
+
+#endif
-- 
2.3.1




Re: [Qemu-block] [PATCH v3 1/2] Add a simple mechanism to protect against error message floods.

2015-07-08 Thread Kevin Wolf
Am 08.07.2015 um 14:50 hat Richard W.M. Jones geschrieben:
 You can now write code like this:
 
   FLOOD_COUNTER(errcount, 100);
   ...
   NO_FLOOD(errcount,
error_report(oops, something bad happened),
error_report(further errors suppressed));
 
 which would print the oops, ... error message up to 100 times,
 followed by the further errors suppressed message once, and then
 nothing else.
 
 Signed-off-by: Richard W.M. Jones rjo...@redhat.com
 ---
  include/qemu/no-flood.h | 34 ++
  1 file changed, 34 insertions(+)
  create mode 100644 include/qemu/no-flood.h
 
 diff --git a/include/qemu/no-flood.h b/include/qemu/no-flood.h
 new file mode 100644
 index 000..90a24da
 --- /dev/null
 +++ b/include/qemu/no-flood.h
 @@ -0,0 +1,34 @@
 +/*
 + * Suppress error floods.
 + *
 + * Copyright (C) 2015 Red Hat, Inc.
 + *
 + * Author: Richard W.M. Jones rjo...@redhat.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#ifndef __QEMU_NO_FLOOD_H
 +#define __QEMU_NO_FLOOD_H 1
 +
 +#include qemu/atomic.h
 +
 +/* Suggested initial value is 100 */
 +#define FLOOD_COUNTER(counter_name, initial) \
 +  static int counter_name = (initial)
 +
 +#define NO_FLOOD(counter_name, code, suppress_message) \
 +  do { \
 +int t_##counter_name = atomic_fetch_add(counter_name, 0); \
 +if (t_##counter_name  0) {\
 +code;  \
 +if (t_##counter_name == 1) {   \
 +suppress_message;  \
 +}  \
 +atomic_dec(counter_name); \
 +}  \
 +} while (0)
 +
 +#endif

I don't think that you actually achieve thread safety with the atomic
operations you use here. The counter may change between your check and
the atomic_dec().

It doesn't matter for curl because we don't get concurrency there
anyway, but it might confuse others if it looks as if it was thread safe
when it fact it isn't.

So I'd suggest that if you have an easy fix for thread safety, do it,
but otherwise don't bother and just remove the atomics.

Other than that, I like this series.

Kevin



Re: [Qemu-block] [PATCH] raw-posix.c: remove raw device access for cdrom

2015-07-08 Thread Programmingkid

On Jul 8, 2015, at 7:01 AM, Kevin Wolf wrote:

 Am 08.07.2015 um 12:47 hat Laurent Vivier geschrieben:
 
 
 On 08/07/2015 12:31, Kevin Wolf wrote:
 Am 02.07.2015 um 16:18 hat Laurent Vivier geschrieben:
 
 
 On 02/07/2015 16:03, Paolo Bonzini wrote:
 
 
 On 02/07/2015 15:58, Laurent Vivier wrote:
 Since any /dev entry can be treated as a raw disk image, it is worth
 noting which devices can be accessed when and how. /dev/rdisk nodes are
 character-special devices, but are raw in the BSD sense and force
 block-aligned I/O. They are closer to the physical disk than the buffer
 cache. /dev/disk nodes, on the other hand, are buffered block-special
 devices and are used primarily by the kernel's filesystem code.
 
 So the right thing to do would not be just to set need_alignment, but to
 probe it like we do on Linux for BDRV_O_NO_CACHE.
 
 I'm okay with doing the simple thing, but it needs a comment for 
 non-BSDers.
 
 So, what we have to do, in our case, for MacOS X cdrom, is something like:
 
 ... GetBSDPath ...
 ...
if (flags  BDRV_O_NOCACHE) {
strcat(bsdPath, r);
}
 ...
 
 I would avoid such magic. What we could do is rejecting /dev/rdisk nodes
 without BDRV_O_NOCACHE.
 
 It's not how it works...
 
 Look in hdev_open().
 
 If user provides /dev/cdrom on the command line, in the case of MacOS X,
 QEMU searches for a cdrom drive in the system and set filename to
 /dev/rdiskX according to the result.
 
 Oh, we're already playing such games... I guess you're right then.
 
 It even seems to be not only for '/dev/cdrom', but for everything
 starting with this string. Does anyone know what's the reason for that?
 
 Also, I guess before doing strcat() on bsdPath, we should check the
 buffer length...

By buffer, do you mean the bsdPath variable?


[Qemu-block] [PATCH for-2.5 2/4] block-backend: add blk_get_max_iov()

2015-07-08 Thread Stefan Hajnoczi
Add a function to query BlockLimits.max_iov.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index aee8a12..5108aec 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -777,6 +777,11 @@ int blk_get_max_transfer_length(BlockBackend *blk)
 return blk-bs-bl.max_transfer_length;
 }
 
+int blk_get_max_iov(BlockBackend *blk)
+{
+return blk-bs-bl.max_iov;
+}
+
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
 bdrv_set_guest_block_size(blk-bs, align);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8fc960f..c114440 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -135,6 +135,7 @@ void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
 int blk_get_max_transfer_length(BlockBackend *blk);
+int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
-- 
2.4.3




[Qemu-block] [PATCH for-2.5 1/4] block: add BlockLimits.max_iov field

2015-07-08 Thread Stefan Hajnoczi
The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix protocol has a maximum of IOV_MAX but
others could have different values.

Instead of assuming raw-posix and hardcoding IOV_MAX in several places,
put the limit into BlockLimits.

Cc: Peter Lieven p...@kamp.de
Suggested-by: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
Peter Lieven: I think the SCSI LUN level does not have a maximum
scatter-gather segments constraint.  That is probably only at the HBA
level.  CCed you anyway in case you think block/iscsi.c should set the
max_iov field.

Kevin: The default is now INT_MAX.  This means non-raw-posix users will
now be able to merge more requests than before.  They were limited to
IOV_MAX previously.  This could expose limits in other BlockDrivers
which we weren't aware of...
---
 block/io.c| 3 +++
 block/raw-posix.c | 1 +
 include/block/block_int.h | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/block/io.c b/block/io.c
index e295992..6750de6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -165,9 +165,11 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 bs-bl.max_transfer_length = bs-file-bl.max_transfer_length;
 bs-bl.min_mem_alignment = bs-file-bl.min_mem_alignment;
 bs-bl.opt_mem_alignment = bs-file-bl.opt_mem_alignment;
+bs-bl.max_iov = bs-file-bl.max_iov;
 } else {
 bs-bl.min_mem_alignment = 512;
 bs-bl.opt_mem_alignment = getpagesize();
+bs-bl.max_iov = INT_MAX;
 }
 
 if (bs-backing_hd) {
@@ -188,6 +190,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 bs-bl.min_mem_alignment =
 MAX(bs-bl.min_mem_alignment,
 bs-backing_hd-bl.min_mem_alignment);
+bs-bl.max_iov = MIN(bs-bl.max_iov, bs-backing_hd-bl.max_iov);
 }
 
 /* Then let the driver override it */
diff --git a/block/raw-posix.c b/block/raw-posix.c
index cbe6574..faa6ae0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -735,6 +735,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 raw_probe_alignment(bs, s-fd, errp);
 bs-bl.min_mem_alignment = s-buf_align;
 bs-bl.opt_mem_alignment = MAX(s-buf_align, getpagesize());
+bs-bl.max_iov = IOV_MAX; /* limit comes from preadv()/pwritev()/etc */
 }
 
 static int check_for_dasd(int fd)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b0476fc..767e83d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -315,6 +315,9 @@ typedef struct BlockLimits {
 
 /* memory alignment for bounce buffer */
 size_t opt_mem_alignment;
+
+/* maximum number of iovec elements */
+int max_iov;
 } BlockLimits;
 
 typedef struct BdrvOpBlocker BdrvOpBlocker;
-- 
2.4.3




[Qemu-block] [PATCH for-2.5 4/4] block/mirror: replace IOV_MAX with blk_get_max_iov()

2015-07-08 Thread Stefan Hajnoczi
Use blk_get_max_iov() instead of hardcoding IOV_MAX, which may not apply
to all BlockDrivers.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/mirror.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 985ad00..0f4445c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -160,11 +160,13 @@ static void mirror_read_complete(void *opaque, int ret)
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = s-common.bs;
-int nb_sectors, sectors_per_chunk, nb_chunks;
+int nb_sectors, sectors_per_chunk, nb_chunks, max_iov;
 int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
 uint64_t delay_ns = 0;
 MirrorOp *op;
 
+max_iov = MIN(source-bl.max_iov, s-target-bl.max_iov);
+
 s-sector_num = hbitmap_iter_next(s-hbi);
 if (s-sector_num  0) {
 bdrv_dirty_iter_init(s-dirty_bitmap, s-hbi);
@@ -241,7 +243,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 trace_mirror_break_buf_busy(s, nb_chunks, s-in_flight);
 break;
 }
-if (IOV_MAX  nb_chunks + added_chunks) {
+if (max_iov  nb_chunks + added_chunks) {
 trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
 break;
 }
-- 
2.4.3




[Qemu-block] [PATCH for-2.5 3/4] block: replace IOV_MAX with BlockLimits.max_iov

2015-07-08 Thread Stefan Hajnoczi
Request merging must not result in a huge request that exceeds the
maximum number of iovec elements.  Use BlockLimits.max_iov instead of
hardcoding IOV_MAX.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/io.c| 3 ++-
 hw/block/virtio-blk.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 6750de6..baffd02 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1818,7 +1818,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 merge = 1;
 }
 
-if (reqs[outidx].qiov-niov + reqs[i].qiov-niov + 1  IOV_MAX) {
+if (reqs[outidx].qiov-niov + reqs[i].qiov-niov + 1 
+bs-bl.max_iov) {
 merge = 0;
 }
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6aefda4..a186956 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -407,7 +407,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
 bool merge = true;
 
 /* merge would exceed maximum number of IOVs */
-if (niov + req-qiov.niov  IOV_MAX) {
+if (niov + req-qiov.niov  blk_get_max_iov(blk)) {
 merge = false;
 }
 
-- 
2.4.3




[Qemu-block] [PATCH 1/2] ignore bdrv_flush operation when no qcow2 cache item is dirty

2015-07-08 Thread Qingshu Chen
qcow2_cache_flush() writes dirty cache to the disk and invokes bdrv_flush()
to make the data durable. But even if there is no dirty cache,
qcow2_cache_flush() would invoke bdrv_flush().  In fact, bdrv_flush() will
invoke fdatasync(), and it is an expensive operation. The patch will not
invoke bdrv_flush if there is not dirty cache. The reason that I modify the
return value of qcow2_cache_flush()  is qcow2_co_flush_to_os needs to know
whether flush operation is called. Following is the patch:

From 23f9f83da4178e8fbb53d2cffe128f5a2d3a239a Mon Sep 17 00:00:00 2001
From: Qingshu Chen qingshu.chen...@gmail.com
Date: Wed, 1 Jul 2015 14:45:23 +0800
Subject: [PATCH 1/2] ignore bdrv_flush operation when no qcow2 cache item is
 dirty
Signed-off-by: Qingshu Chen qingshu.chen...@gmail.com

---
 block/qcow2-cache.c | 9 -
 block/qcow2.c   | 2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index ed92a09..57c0601 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -174,6 +174,7 @@ int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache
*c)
 int result = 0;
 int ret;
 int i;
+int flag = 0;

 trace_qcow2_cache_flush(qemu_coroutine_self(), c == s-l2_table_cache);

@@ -182,15 +183,21 @@ int qcow2_cache_flush(BlockDriverState *bs,
Qcow2Cache *c)
 if (ret  0  result != -ENOSPC) {
 result = ret;
 }
+if (c-entries[i].dirty  c-entries[i].offset) {
+flag = 1;
+}
 }

-if (result == 0) {
+if (result == 0  flag == 1) {
 ret = bdrv_flush(bs-file);
 if (ret  0) {
 result = ret;
 }
 }

+if (flag == 0  result = 0) {
+result = 1;
+}
+
 return result;
 }

diff --git a/block/qcow2.c b/block/qcow2.c
index d522ec7..ab4613a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2504,6 +2504,8 @@ static coroutine_fn int
qcow2_co_flush_to_os(BlockDriverState *bs)
 if (ret  0) {
 qemu_co_mutex_unlock(s-lock);
 return ret;
+} else if (ret == 1) {
+bdrv_flush(bs-file);
 }

 if (qcow2_need_accurate_refcounts(s)) {
--
1.9.1





-
Qingshu Chen,
Institute of Parallel and Distributed Systems (IPADS),
School of Software,
Shanghai Jiao Tong University
---


[Qemu-block] [PATCH for-2.4 4/5] block: Reorder cleanups in bdrv_close()

2015-07-08 Thread Kevin Wolf
Block drivers may still want to access their child nodes in their
.bdrv_close handler. If they unref and/or detach a child by themselves,
this should not result in a double free.

There is additional code for backing files, which are just a special
case of child nodes. The same applies for them.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index b723cf2..d5c9f03 100644
--- a/block.c
+++ b/block.c
@@ -1901,6 +1901,14 @@ void bdrv_close(BlockDriverState *bs)
 if (bs-drv) {
 BdrvChild *child, *next;
 
+bs-drv-bdrv_close(bs);
+
+if (bs-backing_hd) {
+BlockDriverState *backing_hd = bs-backing_hd;
+bdrv_set_backing_hd(bs, NULL);
+bdrv_unref(backing_hd);
+}
+
 QLIST_FOREACH_SAFE(child, bs-children, next, next) {
 /* TODO Remove bdrv_unref() from drivers' close function and use
  * bdrv_unref_child() here */
@@ -1910,12 +1918,6 @@ void bdrv_close(BlockDriverState *bs)
 bdrv_detach_child(child);
 }
 
-if (bs-backing_hd) {
-BlockDriverState *backing_hd = bs-backing_hd;
-bdrv_set_backing_hd(bs, NULL);
-bdrv_unref(backing_hd);
-}
-bs-drv-bdrv_close(bs);
 g_free(bs-opaque);
 bs-opaque = NULL;
 bs-drv = NULL;
-- 
1.8.3.1




[Qemu-block] [PATCH for-2.4 5/5] block: Fix backing file child when modifying graph

2015-07-08 Thread Kevin Wolf
This patch moves bdrv_attach_child() from the individual places that add
a backing file to a BDS to bdrv_set_backing_hd(), which is called by all
of them. It also adds bdrv_detach_child() there.

For normal operation (starting with one backing file chain and not
changing it until the topmost image is closed) and live snapshots, this
constitutes no change in behaviour.

For all other cases, this is a fix for the bug that the old backing file
was still referenced as a child, and the new one wasn't referenced.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 5 +++--
 include/block/block_int.h | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index d5c9f03..d088ee0 100644
--- a/block.c
+++ b/block.c
@@ -1141,6 +1141,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 if (bs-backing_hd) {
 assert(bs-backing_blocker);
 bdrv_op_unblock_all(bs-backing_hd, bs-backing_blocker);
+bdrv_detach_child(bs-backing_child);
 } else if (backing_hd) {
 error_setg(bs-backing_blocker,
node is used as backing hd of '%s',
@@ -1151,8 +1152,10 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 if (!backing_hd) {
 error_free(bs-backing_blocker);
 bs-backing_blocker = NULL;
+bs-backing_child = NULL;
 goto out;
 }
+bs-backing_child = bdrv_attach_child(bs, backing_hd, child_backing);
 bs-open_flags = ~BDRV_O_NO_BACKING;
 pstrcpy(bs-backing_file, sizeof(bs-backing_file), backing_hd-filename);
 pstrcpy(bs-backing_format, sizeof(bs-backing_format),
@@ -1236,7 +1239,6 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 goto free_exit;
 }
 
-bdrv_attach_child(bs, backing_hd, child_backing);
 bdrv_set_backing_hd(bs, backing_hd);
 
 free_exit:
@@ -2171,7 +2173,6 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 /* The contents of 'tmp' will become bs_top, as we are
  * swapping bs_new and bs_top contents. */
 bdrv_set_backing_hd(bs_top, bs_new);
-bdrv_attach_child(bs_top, bs_new, child_backing);
 }
 
 static void bdrv_delete(BlockDriverState *bs)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8996baf..2cc973c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -379,6 +379,7 @@ struct BlockDriverState {
 char exact_filename[PATH_MAX];
 
 BlockDriverState *backing_hd;
+BdrvChild *backing_child;
 BlockDriverState *file;
 
 NotifierList close_notifiers;
-- 
1.8.3.1




[Qemu-block] [PATCH 3/3] mirror: Speed up bitmap initial scanning

2015-07-08 Thread Fam Zheng
Limiting to sectors_per_chunk for each bdrv_is_allocated_above is slow,
because the underlying protocol driver would issue much more queries
than necessary. We should coalesce the query.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/mirror.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index ca55578..e8cb592 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -371,7 +371,7 @@ static void coroutine_fn mirror_run(void *opaque)
 MirrorBlockJob *s = opaque;
 MirrorExitData *data;
 BlockDriverState *bs = s-common.bs;
-int64_t sector_num, end, sectors_per_chunk, length;
+int64_t sector_num, end, length;
 uint64_t last_pause_ns;
 BlockDriverInfo bdi;
 char backing_filename[2]; /* we only need 2 characters because we are only
@@ -425,7 +425,6 @@ static void coroutine_fn mirror_run(void *opaque)
 goto immediate_exit;
 }
 
-sectors_per_chunk = s-granularity  BDRV_SECTOR_BITS;
 mirror_free_init(s);
 
 last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -433,7 +432,9 @@ static void coroutine_fn mirror_run(void *opaque)
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
 BlockDriverState *base = s-base;
 for (sector_num = 0; sector_num  end; ) {
-int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
+/* Just to make sure we are not exceeding int limit. */
+int nb_sectors = MIN(INT_MAX  BDRV_SECTOR_BITS,
+ end - sector_num);
 int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 
 if (now - last_pause_ns  SLICE_TIME) {
@@ -444,9 +445,7 @@ static void coroutine_fn mirror_run(void *opaque)
 if (block_job_is_cancelled(s-common)) {
 goto immediate_exit;
 }
-
-ret = bdrv_is_allocated_above(bs, base,
-  sector_num, next - sector_num, n);
+ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, 
n);
 
 if (ret  0) {
 goto immediate_exit;
@@ -455,10 +454,8 @@ static void coroutine_fn mirror_run(void *opaque)
 assert(n  0);
 if (ret == 1) {
 bdrv_set_dirty_bitmap(s-dirty_bitmap, sector_num, n);
-sector_num = next;
-} else {
-sector_num += n;
 }
+sector_num += n;
 }
 }
 
-- 
2.4.3




[Qemu-block] [PATCH 2/3] mirror: Use block_job_relax_cpu during bitmap scanning

2015-07-08 Thread Fam Zheng
Sleeping for 0 second may not be as effective as we want, use
block_job_relax_cpu.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/mirror.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 62db031..ca55578 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -438,7 +438,7 @@ static void coroutine_fn mirror_run(void *opaque)
 
 if (now - last_pause_ns  SLICE_TIME) {
 last_pause_ns = now;
-block_job_sleep_ns(s-common, QEMU_CLOCK_REALTIME, 0);
+block_job_relax_cpu(s-common);
 }
 
 if (block_job_is_cancelled(s-common)) {
-- 
2.4.3




Re: [Qemu-block] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint

2015-07-08 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Cc: Jeff Cody jc...@redhat.com
 ---
  block/backup.c   | 13 +
  blockjob.c   | 10 ++
  include/block/blockjob.h | 12 
  3 files changed, 35 insertions(+)
 
 diff --git a/block/backup.c b/block/backup.c
 index d3c7d9f..ebb8a88 100644
 --- a/block/backup.c
 +++ b/block/backup.c
 @@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
  bdrv_iostatus_reset(s-target);
  }
  
 +static void backup_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
 +
 +if (backup_job-sync_mode != MIRROR_SYNC_MODE_NONE) {
 +error_setg(errp, this feature or command is not currently 
 supported);

You use this text in a few different errors in the block code; we've currently
got one test machine which is producing it and we haven't yet figured out
which one of the exit paths is doing it.  Please make each one unique and state
an identifier; e.g. checkpoint on block backup in sync ... for blockjob ...
(I'm not sure what exaclty makes sense for block jobs - but something like 
that).

Dave
  
 +return;
 +}
 +
 +hbitmap_reset_all(backup_job-bitmap);
 +}
 +
  static const BlockJobDriver backup_job_driver = {
  .instance_size  = sizeof(BackupBlockJob),
  .job_type   = BLOCK_JOB_TYPE_BACKUP,
  .set_speed  = backup_set_speed,
  .iostatus_reset = backup_iostatus_reset,
 +.do_checkpoint  = backup_do_checkpoint,
  };
  
  static BlockErrorAction backup_error_action(BackupBlockJob *job,
 diff --git a/blockjob.c b/blockjob.c
 index ec46fad..cb412d1 100644
 --- a/blockjob.c
 +++ b/blockjob.c
 @@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
  
  qemu_bh_schedule(data-bh);
  }
 +
 +void block_job_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +if (!job-driver-do_checkpoint) {
 +error_setg(errp, this feature or command is not currently 
 supported);
 +return;
 +}
 +
 +job-driver-do_checkpoint(job, errp);
 +}
 diff --git a/include/block/blockjob.h b/include/block/blockjob.h
 index 57d8ef1..b832dc3 100644
 --- a/include/block/blockjob.h
 +++ b/include/block/blockjob.h
 @@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
   * manually.
   */
  void (*complete)(BlockJob *job, Error **errp);
 +
 +/** Optional callback for job types that support checkpoint. */
 +void (*do_checkpoint)(BlockJob *job, Error **errp);
  } BlockJobDriver;
  
  /**
 @@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
BlockJobDeferToMainLoopFn *fn,
void *opaque);
  
 +/**
 + * block_job_do_checkpoint:
 + * @job: The job.
 + * @errp: Error object.
 + *
 + * Do block checkpoint on the specified job.
 + */
 +void block_job_do_checkpoint(BlockJob *job, Error **errp);
 +
  #endif
 -- 
 2.4.3
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [Qemu-devel] [PATCH COLO-BLOCK v7 00/17] Block replication for continuous checkpoints

2015-07-08 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
 On 07/08/2015 11:49 PM, Michael R. Hines wrote:
  On 07/07/2015 08:38 PM, Wen Congyang wrote:
  On 07/08/2015 12:56 AM, Michael R. Hines wrote:
  On 07/07/2015 04:23 AM, Paolo Bonzini wrote:
  On 07/07/2015 11:13, Dr. David Alan Gilbert wrote:
  This log is very stange. The NBD client connects to NBD server, and 
  NBD server wants to read data
  from NBD client, but reading fails. It seems that the connection is 
  closed unexpectedly. Can you
  give me more log and how do you use it?
  That was the same failure I was getting.   I think it's that the NBD 
  server and client are in different
  modes, with one of them expecting the export.
  nbd_server_add always expects the export.
 
  Paolo
 
  OK, Wen, so your wiki finally does reflect this, but now we're back to 
  the export not found error.
 
  Again, here's the exact command line:
 
  1. First on the secondary VM:
 
  qemu-system-x86_64 .snip...  -drive 
  if=none,driver=qcow2,file=foo.qcow2,id=mc1,cache=none,aio=native -drive 
  if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=7000,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing.file.filename=hidden_disk.qcow2,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=mc1
 
  2. Then, then HMP commands:
 
  nbd_server_start 0:6262
  nbd_server_add -w mc1
 
  3. Then the primary VM:
 
  qemu-system-x86_64 .snip...  -drive 
  if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on
 
  With the error: Server requires an export name
 
  *but*, your wiki has no export name on the primary VM size, so I added 
  the export name back which is on your old wiki:
 
  qemu-system-x86_64 .snip...  -drive 
  if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.export=mc1,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on:
   Failed to read export length
 
  Hmm, I think you use v7 version. Is it right?
  In this version, the correct useage for primary qemu is:
  1. qemu-system-x86_64 .snip...  -drive 
  if=virtio,driver=quorum,read-pattern=fifo,id=disk1,
 
  children.0.file.filename=bar.qcow2,children.0.driver=qcow2,
 
  Then run hmp monitor command(It should be run after you run nbd_server_add 
  in the secondary qemu):
  child_add disk1 
  child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on
 
  Thanks
  Wen Congyang
 
  And server now says:
 
  nbd.c:nbd_handle_export_name():L416: export not found
  nbd.c:nbd_send_negotiate():L562: option negotiation failed
 
  .
 
 
  
  OK, I'm totally confused at this point. =)
  
  Maybe it would be easier to just wait for your next clean patchset + 
  documentation which is all consistent with each other.
 
 I have sent the v8. But the usage is not changed. You can setup the 
 environment according to the wiki.
 When we open nbd client, we need to connect to the nbd server. So I introduce 
 a new command child_add to add NBD client
 as a quorum child when the nbd server is ready.
 
 The nbd server is ready after you run the following command:
 nbd_server_start 0:6262 # the secondary qemu will listen to host:port
 nbd_server_add -w mc1   # the NBD server will know this disk is used as NBD 
 server. The export name is its id wc1.
 # -w means we allow to write to this disk.
 
 Then you can run the following command in the primary qemu:
 child_add disk1 
 child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on
 
 After this monitor command, nbd client has connected to the nbd server.

Ah! The 'child.file.export=mc1' wasn't there previously; I see Yang added that 
to the wiki yesterday;
that probably explains the problem that we've been having.

Dave

 
 Thanks
 Wen Congyang
 
  
  - Michael
  
  .
  
 
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan

2015-07-08 Thread Fam Zheng
This supersedes:

http://patchwork.ozlabs.org/patch/491415/

and [1] which is currently in Jeff's tree.

Although [1] fixed the QMP responsiveness, Alexandre DERUMIER reported that
guest responsiveness still suffers when we are busy in the initial dirty bitmap
scanning loop of mirror job. That is because 1) we issue too many lseeks; 2) we
only sleep for 0 ns which turns out quite ineffective in yielding BQL to vcpu
threads.  Both are fixed.

To reproduce: start a guest, attach a 10G raw image, then mirror it.  Your
guest will immediately start to stutter (with patch [1] testing on a local ext4
raw image, and while echo -n .; do sleep 0.05; done in guest console).

This series adds block_job_relax_cpu as suggested by Stefan Hajnoczi and uses
it in mirror job; and lets bdrv_is_allocated_above return a larger p_num as
suggested by Paolo Bonzini (although it's done without changing the API).

[1]: http://patchwork.ozlabs.org/patch/471656/ block/mirror: Sleep
 periodically during bitmap scanning

Fam Zheng (3):
  blockjob: Introduce block_job_relax_cpu
  mirror: Use block_job_relax_cpu during bitmap scanning
  mirror: Speed up bitmap initial scanning

 block/mirror.c   | 17 +++--
 include/block/blockjob.h | 16 
 2 files changed, 23 insertions(+), 10 deletions(-)

-- 
2.4.3




Re: [Qemu-block] [Qemu-devel] [PATCH COLO-BLOCK v7 00/17] Block replication for continuous checkpoints

2015-07-08 Thread Wen Congyang
On 07/09/2015 09:55 AM, Dr. David Alan Gilbert wrote:
 * Wen Congyang (we...@cn.fujitsu.com) wrote:
 On 07/08/2015 11:49 PM, Michael R. Hines wrote:
 On 07/07/2015 08:38 PM, Wen Congyang wrote:
 On 07/08/2015 12:56 AM, Michael R. Hines wrote:
 On 07/07/2015 04:23 AM, Paolo Bonzini wrote:
 On 07/07/2015 11:13, Dr. David Alan Gilbert wrote:
 This log is very stange. The NBD client connects to NBD server, and 
 NBD server wants to read data
 from NBD client, but reading fails. It seems that the connection is 
 closed unexpectedly. Can you
 give me more log and how do you use it?
 That was the same failure I was getting.   I think it's that the NBD 
 server and client are in different
 modes, with one of them expecting the export.
 nbd_server_add always expects the export.

 Paolo

 OK, Wen, so your wiki finally does reflect this, but now we're back to 
 the export not found error.

 Again, here's the exact command line:

 1. First on the secondary VM:

 qemu-system-x86_64 .snip...  -drive 
 if=none,driver=qcow2,file=foo.qcow2,id=mc1,cache=none,aio=native -drive 
 if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=7000,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing.file.filename=hidden_disk.qcow2,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=mc1

 2. Then, then HMP commands:

 nbd_server_start 0:6262
 nbd_server_add -w mc1

 3. Then the primary VM:

 qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on

 With the error: Server requires an export name

 *but*, your wiki has no export name on the primary VM size, so I added 
 the export name back which is on your old wiki:

 qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.export=mc1,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on:
  Failed to read export length

 Hmm, I think you use v7 version. Is it right?
 In this version, the correct useage for primary qemu is:
 1. qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,id=disk1,

 children.0.file.filename=bar.qcow2,children.0.driver=qcow2,

 Then run hmp monitor command(It should be run after you run nbd_server_add 
 in the secondary qemu):
 child_add disk1 
 child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on

 Thanks
 Wen Congyang

 And server now says:

 nbd.c:nbd_handle_export_name():L416: export not found
 nbd.c:nbd_send_negotiate():L562: option negotiation failed

 .



 OK, I'm totally confused at this point. =)

 Maybe it would be easier to just wait for your next clean patchset + 
 documentation which is all consistent with each other.

 I have sent the v8. But the usage is not changed. You can setup the 
 environment according to the wiki.
 When we open nbd client, we need to connect to the nbd server. So I 
 introduce a new command child_add to add NBD client
 as a quorum child when the nbd server is ready.

 The nbd server is ready after you run the following command:
 nbd_server_start 0:6262 # the secondary qemu will listen to host:port
 nbd_server_add -w mc1   # the NBD server will know this disk is used as NBD 
 server. The export name is its id wc1.
 # -w means we allow to write to this disk.

 Then you can run the following command in the primary qemu:
 child_add disk1 
 child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on

 After this monitor command, nbd client has connected to the nbd server.
 
 Ah! The 'child.file.export=mc1' wasn't there previously; I see Yang added 
 that to the wiki yesterday;
 that probably explains the problem that we've been having.

Sorry for this mistake.

Thanks
Wen Congyang

 
 Dave
 

 Thanks
 Wen Congyang


 - Michael

 .



 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
 .
 




Re: [Qemu-block] [Qemu-devel] [PATCH COLO-BLOCK v7 00/17] Block replication for continuous checkpoints

2015-07-08 Thread Wen Congyang
On 07/08/2015 11:49 PM, Michael R. Hines wrote:
 On 07/07/2015 08:38 PM, Wen Congyang wrote:
 On 07/08/2015 12:56 AM, Michael R. Hines wrote:
 On 07/07/2015 04:23 AM, Paolo Bonzini wrote:
 On 07/07/2015 11:13, Dr. David Alan Gilbert wrote:
 This log is very stange. The NBD client connects to NBD server, and NBD 
 server wants to read data
 from NBD client, but reading fails. It seems that the connection is 
 closed unexpectedly. Can you
 give me more log and how do you use it?
 That was the same failure I was getting.   I think it's that the NBD 
 server and client are in different
 modes, with one of them expecting the export.
 nbd_server_add always expects the export.

 Paolo

 OK, Wen, so your wiki finally does reflect this, but now we're back to the 
 export not found error.

 Again, here's the exact command line:

 1. First on the secondary VM:

 qemu-system-x86_64 .snip...  -drive 
 if=none,driver=qcow2,file=foo.qcow2,id=mc1,cache=none,aio=native -drive 
 if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=7000,file.file.filename=active_disk.qcow2,file.driver=qcow2,file.backing.file.filename=hidden_disk.qcow2,file.backing.driver=qcow2,file.backing.allow-write-backing-file=on,file.backing.backing.backing_reference=mc1

 2. Then, then HMP commands:

 nbd_server_start 0:6262
 nbd_server_add -w mc1

 3. Then the primary VM:

 qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on

 With the error: Server requires an export name

 *but*, your wiki has no export name on the primary VM size, so I added the 
 export name back which is on your old wiki:

 qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,children.0.file.filename=bar.qcow2,children.0.driver=qcow2,children.1.file.driver=nbd,children.1.file.export=mc1,children.1.file.host=127.0.0.1,children.1.file.port=6262,children.1.driver=replication,children.1.mode=primary,children.1.ignore-errors=on:
  Failed to read export length

 Hmm, I think you use v7 version. Is it right?
 In this version, the correct useage for primary qemu is:
 1. qemu-system-x86_64 .snip...  -drive 
 if=virtio,driver=quorum,read-pattern=fifo,id=disk1,

 children.0.file.filename=bar.qcow2,children.0.driver=qcow2,

 Then run hmp monitor command(It should be run after you run nbd_server_add 
 in the secondary qemu):
 child_add disk1 
 child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on

 Thanks
 Wen Congyang

 And server now says:

 nbd.c:nbd_handle_export_name():L416: export not found
 nbd.c:nbd_send_negotiate():L562: option negotiation failed

 .


 
 OK, I'm totally confused at this point. =)
 
 Maybe it would be easier to just wait for your next clean patchset + 
 documentation which is all consistent with each other.

I have sent the v8. But the usage is not changed. You can setup the environment 
according to the wiki.
When we open nbd client, we need to connect to the nbd server. So I introduce a 
new command child_add to add NBD client
as a quorum child when the nbd server is ready.

The nbd server is ready after you run the following command:
nbd_server_start 0:6262 # the secondary qemu will listen to host:port
nbd_server_add -w mc1   # the NBD server will know this disk is used as NBD 
server. The export name is its id wc1.
# -w means we allow to write to this disk.

Then you can run the following command in the primary qemu:
child_add disk1 
child.driver=replication,child.mode=primary,child.file.host=127.0.0.1,child.file.port=6262,child.file.export=mc1,child.file.driver=nbd,child.ignore-errors=on

After this monitor command, nbd client has connected to the nbd server.

Thanks
Wen Congyang

 
 - Michael
 
 .
 




Re: [Qemu-block] [PATCH COLO-BLOCK v8 09/18] Backup: clear all bitmap when doing block checkpoint

2015-07-08 Thread Wen Congyang
On 07/09/2015 09:28 AM, Dr. David Alan Gilbert wrote:
 * Wen Congyang (we...@cn.fujitsu.com) wrote:
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
 Signed-off-by: Gonglei arei.gong...@huawei.com
 Cc: Jeff Cody jc...@redhat.com
 ---
  block/backup.c   | 13 +
  blockjob.c   | 10 ++
  include/block/blockjob.h | 12 
  3 files changed, 35 insertions(+)

 diff --git a/block/backup.c b/block/backup.c
 index d3c7d9f..ebb8a88 100644
 --- a/block/backup.c
 +++ b/block/backup.c
 @@ -211,11 +211,24 @@ static void backup_iostatus_reset(BlockJob *job)
  bdrv_iostatus_reset(s-target);
  }
  
 +static void backup_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
 +
 +if (backup_job-sync_mode != MIRROR_SYNC_MODE_NONE) {
 +error_setg(errp, this feature or command is not currently 
 supported);
 
 You use this text in a few different errors in the block code; we've currently
 got one test machine which is producing it and we haven't yet figured out
 which one of the exit paths is doing it.  Please make each one unique and 
 state
 an identifier; e.g. checkpoint on block backup in sync ... for blockjob ...
 (I'm not sure what exaclty makes sense for block jobs - but something like 
 that).

OK, I will check all error messages.

Thanks
Wen Congyang

 
 Dave
   
 +return;
 +}
 +
 +hbitmap_reset_all(backup_job-bitmap);
 +}
 +
  static const BlockJobDriver backup_job_driver = {
  .instance_size  = sizeof(BackupBlockJob),
  .job_type   = BLOCK_JOB_TYPE_BACKUP,
  .set_speed  = backup_set_speed,
  .iostatus_reset = backup_iostatus_reset,
 +.do_checkpoint  = backup_do_checkpoint,
  };
  
  static BlockErrorAction backup_error_action(BackupBlockJob *job,
 diff --git a/blockjob.c b/blockjob.c
 index ec46fad..cb412d1 100644
 --- a/blockjob.c
 +++ b/blockjob.c
 @@ -400,3 +400,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
  
  qemu_bh_schedule(data-bh);
  }
 +
 +void block_job_do_checkpoint(BlockJob *job, Error **errp)
 +{
 +if (!job-driver-do_checkpoint) {
 +error_setg(errp, this feature or command is not currently 
 supported);
 +return;
 +}
 +
 +job-driver-do_checkpoint(job, errp);
 +}
 diff --git a/include/block/blockjob.h b/include/block/blockjob.h
 index 57d8ef1..b832dc3 100644
 --- a/include/block/blockjob.h
 +++ b/include/block/blockjob.h
 @@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
   * manually.
   */
  void (*complete)(BlockJob *job, Error **errp);
 +
 +/** Optional callback for job types that support checkpoint. */
 +void (*do_checkpoint)(BlockJob *job, Error **errp);
  } BlockJobDriver;
  
  /**
 @@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
BlockJobDeferToMainLoopFn *fn,
void *opaque);
  
 +/**
 + * block_job_do_checkpoint:
 + * @job: The job.
 + * @errp: Error object.
 + *
 + * Do block checkpoint on the specified job.
 + */
 +void block_job_do_checkpoint(BlockJob *job, Error **errp);
 +
  #endif
 -- 
 2.4.3

 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
 .
 




[Qemu-block] [PATCH for-2.4 0/5] block: Fix backing file child when modifying graph

2015-07-08 Thread Kevin Wolf
This series is extracted from my work towards removing bdrv_swap(),
which is targeted for 2.5. It contains a fix for dangling pointers when
modifying the BDS graph and its dependencies.

I didn't bother to split patches of which only a part is required, nor
did I remove references to the future bdrv_swap removal (after all, it
will happen, even if the patches will be delayed by the 2.4 freeze).

Specifically, bdrv_open/unref_child() are yet unused in this series; the
respective patches are included because of bdrv_attach/detach_child().

Kevin Wolf (5):
  block: Move bdrv_attach_child() calls up the call chain
  block: Introduce bdrv_open_child()
  block: Introduce bdrv_unref_child()
  block: Reorder cleanups in bdrv_close()
  block: Fix backing file child when modifying graph

 block.c   | 144 --
 include/block/block.h |   7 +++
 include/block/block_int.h |   1 +
 3 files changed, 108 insertions(+), 44 deletions(-)

-- 
1.8.3.1




[Qemu-block] [PATCH for-2.4 2/5] block: Introduce bdrv_open_child()

2015-07-08 Thread Kevin Wolf
It is the same as bdrv_open_image(), except that it doesn't only return
success or failure, but the newly created BdrvChild object for the new
child node.

As the BdrvChild object already contains a BlockDriverState pointer (and
this is supposed to become the only pointer so that bdrv_append() and
friends can just change a single pointer in BdrvChild), the pbs
parameter is removed for bdrv_open_child().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.c   | 71 +--
 include/block/block.h |  6 +
 2 files changed, 58 insertions(+), 19 deletions(-)

diff --git a/block.c b/block.c
index 0398bff..029feeb 100644
--- a/block.c
+++ b/block.c
@@ -1102,9 +1102,9 @@ static int bdrv_fill_options(QDict **options, const char 
**pfilename,
 return 0;
 }
 
-static void bdrv_attach_child(BlockDriverState *parent_bs,
-  BlockDriverState *child_bs,
-  const BdrvChildRole *child_role)
+static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+BlockDriverState *child_bs,
+const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
@@ -1113,6 +1113,8 @@ static void bdrv_attach_child(BlockDriverState *parent_bs,
 };
 
 QLIST_INSERT_HEAD(parent_bs-children, child, next);
+
+return child;
 }
 
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
@@ -1229,7 +1231,7 @@ free_exit:
  * device's options.
  *
  * If allow_none is true, no image will be opened if filename is false and no
- * BlockdevRef is given. *pbs will remain unchanged and 0 will be returned.
+ * BlockdevRef is given. NULL will be returned, but errp remains unset.
  *
  * bdrev_key specifies the key for the image's BlockdevRef in the options 
QDict.
  * That QDict has to be flattened; therefore, if the BlockdevRef is a QDict
@@ -1237,21 +1239,20 @@ free_exit:
  * BlockdevRef.
  *
  * The BlockdevRef will be removed from the options QDict.
- *
- * To conform with the behavior of bdrv_open(), *pbs has to be NULL.
  */
-int bdrv_open_image(BlockDriverState **pbs, const char *filename,
-QDict *options, const char *bdref_key,
-BlockDriverState* parent, const BdrvChildRole *child_role,
-bool allow_none, Error **errp)
+BdrvChild *bdrv_open_child(const char *filename,
+   QDict *options, const char *bdref_key,
+   BlockDriverState* parent,
+   const BdrvChildRole *child_role,
+   bool allow_none, Error **errp)
 {
+BdrvChild *c = NULL;
+BlockDriverState *bs;
 QDict *image_options;
 int ret;
 char *bdref_key_dot;
 const char *reference;
 
-assert(pbs);
-assert(*pbs == NULL);
 assert(child_role != NULL);
 
 bdref_key_dot = g_strdup_printf(%s., bdref_key);
@@ -1260,28 +1261,60 @@ int bdrv_open_image(BlockDriverState **pbs, const char 
*filename,
 
 reference = qdict_get_try_str(options, bdref_key);
 if (!filename  !reference  !qdict_size(image_options)) {
-if (allow_none) {
-ret = 0;
-} else {
+if (!allow_none) {
 error_setg(errp, A block device must be specified for \%s\,
bdref_key);
-ret = -EINVAL;
 }
 QDECREF(image_options);
 goto done;
 }
 
-ret = bdrv_open_inherit(pbs, filename, reference, image_options, 0,
+bs = NULL;
+ret = bdrv_open_inherit(bs, filename, reference, image_options, 0,
 parent, child_role, NULL, errp);
 if (ret  0) {
 goto done;
 }
 
-bdrv_attach_child(parent, *pbs, child_role);
+c = bdrv_attach_child(parent, bs, child_role);
 
 done:
 qdict_del(options, bdref_key);
-return ret;
+return c;
+}
+
+/*
+ * This is a version of bdrv_open_child() that returns 0/-EINVAL instead of
+ * a BdrvChild object.
+ *
+ * If allow_none is true, no image will be opened if filename is false and no
+ * BlockdevRef is given. *pbs will remain unchanged and 0 will be returned.
+ *
+ * To conform with the behavior of bdrv_open(), *pbs has to be NULL.
+ */
+int bdrv_open_image(BlockDriverState **pbs, const char *filename,
+QDict *options, const char *bdref_key,
+BlockDriverState* parent, const BdrvChildRole *child_role,
+bool allow_none, Error **errp)
+{
+Error *local_err = NULL;
+BdrvChild *c;
+
+assert(pbs);
+assert(*pbs == NULL);
+
+c = bdrv_open_child(filename, options, bdref_key, parent, child_role,
+allow_none, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+if (c != NULL) {
+*pbs = c-bs;
+}
+
+return 0;
 }
 
 int