date:20160614

Re: [Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a

s/with/within/

> resonable amount of time:

s/resonable/reasonable/

> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.

s/sycnchronous/synchronous/

> 
> This patch implements the latter. It adds mirror_before_write_notify
> callback. In this case all data written from the guest is synchnonously

s/synchnonously/synchronously/

> written to the mirror target. Though the problem is solved partially.
> We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
> done in the next patch.
> 

In other words, the mere act of mirroring a guest will now be
guest-visible in that the guest is auto-throttled while waiting for the
mirroring to be written out.  It seems like you would want to be able to
opt in or out of this scheme.  Is it something that can be toggled
mid-operation (try asynchronous, and switch to synchronous if a timeout
elapses)?

> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 

I'll leave the actual idea to others to review, because there may be
some ramifications that I'm not thinking of.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

In the subject: 'allow to save buffer' is not idiomatic English; better
would be 'allow saving the buffer' or simply 'save the buffer'

> Properly cook MirrorOp initialization/deinitialization. The field is not
> yet used actually.

Another "what" but not "why" - expanding the commit message to mention
"why" makes it easier to review.

> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 32 +++-
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index d8be80a..7471211 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -73,6 +73,7 @@ typedef struct MirrorOp {
>  QEMUIOVector qiov;
>  int64_t sector_num;
>  int nb_sectors;
> +void *buf;
>  } MirrorOp;
>  
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -100,24 +101,28 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  s->in_flight--;
>  s->sectors_in_flight -= op->nb_sectors;
>  iov = op->qiov.iov;
> -for (i = 0; i < op->qiov.niov; i++) {
> -MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
> -QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
> -s->buf_free_count++;
> -}
>  
> -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -chunk_num = op->sector_num / sectors_per_chunk;
> -nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
> -bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
> -if (ret >= 0) {
> -if (s->cow_bitmap) {
> -bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
> +if (op->buf == NULL) {
> +for (i = 0; i < op->qiov.niov; i++) {
> +MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
> +QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
> +s->buf_free_count++;
> +}
> +
> +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +chunk_num = op->sector_num / sectors_per_chunk;
> +nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
> +bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);

I still think it might be smarter to fix bitmap operations to work on
byte inputs (still sectors, or rather granularity chunks, under the
hood, but no need to make users scale things just to have it scaled again).

> +if (ret >= 0) {
> +if (s->cow_bitmap) {
> +bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
> +}
> +s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
>  }
> -s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
>  }
>  
>  qemu_iovec_destroy(>qiov);
> +g_free(op->buf);
>  g_free(op);
>  
>  if (s->waiting_for_io) {
> @@ -255,6 +260,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
> sector_num,
>  op->s = s;
>  op->sector_num = sector_num;
>  op->nb_sectors = nb_sectors;
> +op->buf = NULL;
>  
>  /* Now make a QEMUIOVector taking enough granularity-sized chunks
>   * from s->buf_free.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> Signed-off-by: Denis V. Lunev 

The commit message says what, but not why.  It's useful to give
reviewers a hint as to why a patch makes sense (such as a future patch
being able to use the write notifier to make mirroring more efficient if
it knows what is being mirrored).

> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/io.c| 12 +++-
>  include/block/block_int.h |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 

> @@ -2228,7 +2230,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  return 0;
>  }
>  
> -tracked_request_begin(, bs, sector_num, nb_sectors,
> +tracked_request_begin(, bs, NULL, sector_num, nb_sectors,
>BDRV_TRACKED_DISCARD);
>  bdrv_set_dirty(bs, sector_num, nb_sectors);
>  
> @@ -2331,7 +2333,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int 
> req, void *buf)
>  };
>  BlockAIOCB *acb;
>  
> -tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
> +tracked_request_begin(_req, bs, NULL, 0, 0, BDRV_TRACKED_IOCTL);
>  if (!drv || !drv->bdrv_aio_ioctl) {
>  co.ret = -ENOTSUP;
>  goto out;
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 30a9717..72f463a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -69,6 +69,7 @@ enum BdrvTrackedRequestType {
>  
>  typedef struct BdrvTrackedRequest {
>  BlockDriverState *bs;
> +QEMUIOVector *qiov;
>  int64_t offset;
>  unsigned int bytes;

I guess bytes and qiov->size are not always redundant, because you pass
NULL for qiov for a zero or discard operation (an alternative would be
to always pass a qiov, even for zero/discard, so that we only need a
single size).  But I've been pointing out our inconsistent use of qiov
for zeroes in multiple places, so I don't think it's worth changing in
this series, but in one of its own if we want to do it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> We should not take into account zero blocks for delay calculations.
> They are not read and thus IO throttling is not required. In the
> other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> days.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Seems reasonable, but I'll let others more familiar with throttling give
the final say.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 4/9] mirror: efficiently zero out target

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed
> into the wire. Thus the target could be very efficiently zeroed out. This
> is should be done with the largest chunk possible.
> 
> This improves the performance of the live migration of the empty disk by
> 150 times if NBD supports write_zeroes.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index c7b3639..c2f8773 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -21,6 +21,7 @@
>  #include "qemu/ratelimit.h"
>  #include "qemu/bitmap.h"
>  
> +#define MIRROR_ZERO_CHUNK   (3u << (29 - BDRV_SECTOR_BITS))  /* 1.5 Gb */

Probably nicer to track this in bytes.  And do you really want a
hard-coded arbitrary limit, or is it better to live with
MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?

> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  
>  end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
> +if (base == NULL && !bdrv_has_zero_init(target_bs) &&
> +target_bs->drv->bdrv_co_write_zeroes == NULL) {

Indentation is off, although if checkpatch.pl doesn't complain I guess
it doesn't matter that much.

Why should you care whether the target_bs->drv implements a callback?
Can't you just rely on the normal bdrv_*() functions to do the dirty
work of picking the most efficient implementation without you having to
bypass the block layer?  In fact, isn't that the whole goal of
bdrv_make_zero() - why not call that instead of reimplementing it?

Patch needs rebasing - we've redone this into bdrv_co_pwrite_zeroes and
a byte interface, since upstream commit c1499a5e.

>  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
>  return 0;
>  }
> @@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  }
>  sector_num += n;
>  }
> +
> +if (base != NULL || bdrv_has_zero_init(target_bs)) {

You're now repeating the conditional that used to be 'bool
mark_all_dirty' (well, this is !mark_all_dirty); is it worth keeping the
simpler bool around?

> +/* no need to zero out entire disk */
> +return 0;
> +}
> +
> +for (sector_num = 0; sector_num < end; ) {
> +int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num);

Why limit yourself to 1.5G? It's either too small for what you can
really do, or too large for what the device permits.  See my above
comment about MIN_NON_ZERO.

> +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +
> +if (now - last_pause_ns > SLICE_TIME) {
> +last_pause_ns = now;
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
> +}
> +
> +if (block_job_is_cancelled(>common)) {
> +return -EINTR;
> +}
> +
> +if (s->in_flight == MAX_IN_FLIGHT) {
> +trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
> +mirror_wait_for_io(s);
> +continue;
> +}

Hmm - I guess your mirror yield points are why you couldn't just
directly use bdrv_make_zero(); but is that something where some code
refactoring can share more code rather than duplicating it?

> +
> +mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
> +sector_num += nb_sectors;
> +}
>  return 0;
>  }
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> There is no need to scan allocation tables if we have mark_all_dirty flag
> set. Just mark it all dirty.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 797659d..c7b3639 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  BlockDriverState *base = s->base;
>  BlockDriverState *bs = blk_bs(s->common.blk);
>  BlockDriverState *target_bs = blk_bs(s->target);
> -bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs);
>  uint64_t last_pause_ns;
>  int ret, n;
>  
>  end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> +if (base == NULL && !bdrv_has_zero_init(target_bs)) {
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);

Won't work as written.  'end' is 64 bits, but bdrv_set_dirty_bitmap() is
limited to a 32-bit sector count.  Might be first worth updating
bdrv_set_dirty_bitmap() and friends to be byte-based rather than
sector-based (but still tracking a sector per bit, obviously), as well
as expand it to operate on 64-bit sizes rather than 32-bit.

I'm also worried slightly that the existing code repeated things in a
loop, and therefore had pause points every iteration and could thus
remain responsive to an early cancel.  But doing the entire operation in
one chunk (assuming you fix bitmap code to handle a 64-bit size) may end
up running for so long without interruption that you lose the benefits
of an early interruption that you have by virtue of a 32-bit limit.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> The code inside the helper will be extended in the next patch. mirror_run
> itself is overbloated at the moment.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 83 
> ++
>  1 file changed, 49 insertions(+), 34 deletions(-)
> 

Looks like a nice split.
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size

2016-06-14 Thread Fam Zheng

On Tue, 06/14 16:49, Max Reitz wrote:
> On 18.05.2016 07:53, Fam Zheng wrote:
> > Otherwise the job is orphaned and block_job_cancel_sync in
> > bdrv_close_all() when quiting will hang.
> > 
> > A simple reproducer is running blockdev-backup from null-co:// to
> > null-co://.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/backup.c | 34 ++
> >  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> Sorry for having waited so long (I was thinking that maybe Jeff wanted
> to take this patch), but now the patch no longer applies and I don't
> feel comfortable with just fixing it up myself; git-backport-diff tells
> me the required changes are "[0015] [FC]".

Oh it turns out this patch is superseded by

commit 91ab68837933232bcef99da7c968e6d41900419b
Author: Kevin Wolf 
Date:   Thu Apr 14 12:59:55 2016 +0200

backup: Don't leak BackupBlockJob in error path

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 

So let's just drop it.

Fam

Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size

2016-06-14 Thread Fam Zheng

On Tue, 06/14 16:49, Max Reitz wrote:
> On 18.05.2016 07:53, Fam Zheng wrote:
> > Otherwise the job is orphaned and block_job_cancel_sync in
> > bdrv_close_all() when quiting will hang.
> > 
> > A simple reproducer is running blockdev-backup from null-co:// to
> > null-co://.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/backup.c | 34 ++
> >  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> Sorry for having waited so long (I was thinking that maybe Jeff wanted
> to take this patch), but now the patch no longer applies and I don't
> feel comfortable with just fixing it up myself; git-backport-diff tells
> me the required changes are "[0015] [FC]".

No problem! I'll rebase it and submit again.

Fam

Re: [Qemu-block] [PATCH v3 08/14] block/nbd: Accept SocketAddress

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Add a new option "address" to the NBD block driver which accepts a
> SocketAddress.
> 
> "path", "host" and "port" are still supported as legacy options and are
> mapped to their corresponding SocketAddress representation.

The back-compat work is pretty slick.

> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c   | 97 
> ++-
>  tests/qemu-iotests/051.out|  4 +-
>  tests/qemu-iotests/051.pc.out |  4 +-
>  3 files changed, 64 insertions(+), 41 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 3adf302..9ae66ba 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -32,6 +32,8 @@
>  #include "qemu/uri.h"
>  #include "block/block_int.h"
>  #include "qemu/module.h"
> +#include "qapi-visit.h"
> +#include "qapi/qmp-input-visitor.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qjson.h"
>  #include "qapi/qmp/qint.h"
> @@ -128,7 +130,9 @@ static bool nbd_has_filename_options_conflict(QDict 
> *options, Error **errp)
>  if (!strcmp(e->key, "host")
>  || !strcmp(e->key, "port")
>  || !strcmp(e->key, "path")
> -|| !strcmp(e->key, "export"))
> +|| !strcmp(e->key, "export")
> +|| !strcmp(e->key, "address")
> +|| !strncmp(e->key, "address.", 8))

May need a tweak if you break before '||'

>  {
>  error_setg(errp, "Option '%s' cannot be used with a file name",
> e->key);
> @@ -202,46 +206,66 @@ out:
>  g_free(file);
>  }
>  
> +static bool nbd_process_legacy_socket_options(QDict *options, Error **errp)
> +{
> +if (qdict_haskey(options, "path") && qdict_haskey(options, "host")) {
> +error_setg(errp, "path and host may not be used at the same time");
> +return false;
> +} else if (qdict_haskey(options, "path")) {
> +if (qdict_haskey(options, "port")) {
> +error_setg(errp, "port may not be used without host");
> +return false;
> +}
> +
> +qdict_put(options, "address.type", qstring_from_str("unix"));
> +qdict_change_key(options, "path", "address.data.path");
> +} else if (qdict_haskey(options, "host")) {
> +qdict_put(options, "address.type", qstring_from_str("inet"));
> +qdict_change_key(options, "host", "address.data.host");
> +if (!qdict_change_key(options, "port", "address.data.port")) {
> +qdict_put(options, "address.data.port",
> +  qstring_from_str(stringify(NBD_DEFAULT_PORT)));
> +}

The rewrite from old to modern is rather nice.  I wonder if we could
then use a generated qapi_visit_SocketAddress instead of manually
handling all the complication of SocketAddress proper.

> +}
> +
> +return true;
> +}
> +
>  static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char 
> **export,
>   Error **errp)
>  {
> -SocketAddress *saddr;
> +SocketAddress *saddr = NULL;
> +QDict *addr = NULL;
> +QObject *crumpled_addr;
> +QmpInputVisitor *iv;
> +Error *local_err = NULL;
>  
> -if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
> -if (qdict_haskey(options, "path")) {
> -error_setg(errp, "path and host may not be used at the same 
> time");
> -} else {
> -error_setg(errp, "one of path and host must be specified");
> -}
> -return NULL;
> +if (!nbd_process_legacy_socket_options(options, errp)) {
> +goto fail;
>  }
> -if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) {
> -error_setg(errp, "port may not be used without host");
> -return NULL;
> +
> +qdict_extract_subqdict(options, , "address.");
> +if (!qdict_size(addr)) {
> +error_setg(errp, "NBD server address missing");
> +goto fail;
>  }
>  
> -saddr = g_new0(SocketAddress, 1);
> +crumpled_addr = qdict_crumple(addr, true, errp);
> +if (!crumpled_addr) {
> +goto fail;
> +}
>  

Ah, so you ARE depending on Dan's qdict_crumple().

> -if (qdict_haskey(options, "path")) {
> -UnixSocketAddress *q_unix;
> -saddr->type = SOCKET_ADDRESS_KIND_UNIX;
> -q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
> -q_unix->path = g_strdup(qdict_get_str(options, "path"));
> -qdict_del(options, "path");
> -} else {
> -InetSocketAddress *inet;
> -saddr->type = SOCKET_ADDRESS_KIND_INET;
> -inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
> -inet->host = g_strdup(qdict_get_str(options, "host"));
> -if (!qdict_get_try_str(options, "port")) {
> -inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
> -} else {
> -inet->port = g_strdup(qdict_get_str(options, "port"));
> -}
> -

Re: [Qemu-block] [PATCH v3 07/14] block/nbd: "address" in nbd_refresh_filename()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> As of a future patch, the NBD block driver will accept a SocketAddress
> structure for a new "address" option. In order to support this,
> nbd_refresh_filename() needs some changes.
> 
> The two TODOs introduced by this patch will be removed in the very next
> one. They exist to explain that it is currently impossible for
> nbd_refresh_filename() to emit an "address.*" option (which the NBD
> block driver does not handle yet). The next patch will arm these code
> paths, but it will also enable handling of these options.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 84 
> -
>  1 file changed, 61 insertions(+), 23 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 1736f68..3adf302 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -441,37 +441,75 @@ static void nbd_attach_aio_context(BlockDriverState *bs,
>  static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
>  {
>  QDict *opts = qdict_new();
> -const char *path   = qdict_get_try_str(options, "path");
> -const char *host   = qdict_get_try_str(options, "host");
> -const char *port   = qdict_get_try_str(options, "port");
> +bool can_generate_filename = true;
> +const char *path = NULL, *host = NULL, *port = NULL;
>  const char *export = qdict_get_try_str(options, "export");
>  const char *tlscreds = qdict_get_try_str(options, "tls-creds");
>  
> -if (host && !port) {
> -port = stringify(NBD_DEFAULT_PORT);
> +if (qdict_get_try_str(options, "address.type")) {
> +/* This path will only be possible as of a future patch;
> + * TODO: Remove this note once it is */
> +
> +const char *type = qdict_get_str(options, "address.type");
> +

Oh, I'm so tempted to teach the QAPI generator how to make a
discriminated union have a default 'type' value (thus making the
discriminator optional), so that we don't need a layer of nesting behind
'address.'.

> +if (!strcmp(type, "unix")) {
> +path = qdict_get_str(options, "address.data.path");
> +} else if (!strcmp(type, "inet")) {
> +host = qdict_get_str(options, "address.data.host");
> +port = qdict_get_str(options, "address.data.port");

It's especially annoying that because SocketAddress is not flat, we have
to expose the 'data.' layer of nesting, even if we could avoid the
'address.' layer.

> +
> +can_generate_filename = !qdict_haskey(options, "address.data.to")
> + && !qdict_haskey(options, 
> "address.data.ipv4")
> + && !qdict_haskey(options, 
> "address.data.ipv6");
> +} else {
> +can_generate_filename = false;
> +}
> +} else {
> +path = qdict_get_try_str(options, "path");
> +host = qdict_get_try_str(options, "host");
> +port = qdict_get_try_str(options, "port");
> +
> +if (host && !port) {
> +port = stringify(NBD_DEFAULT_PORT);
> +}
>  }

Looks clean given the constraints of what you are able to use from QAPI.

> +
> +if (qdict_get_try_str(options, "address.type")) {
> +/* This path will only be possible as of a future patch;
> + * TODO: Remove this note once it is */
> +
> +const QDictEntry *e;
> +for (e = qdict_first(options); e; e = qdict_next(options, e)) {
> +if (!strncmp(e->key, "address.", 8)) {
> +qobject_incref(e->value);
> +qdict_put_obj(opts, e->key, e->value);
> +}
> +}

This part makes me wonder if we want Dan's qdict_crumple() working first.

>  } else {
> -qdict_put(opts, "host", qstring_from_str(host));
> -qdict_put(opts, "port", qstring_from_str(port));
> +if (path) {
> +qdict_put(opts, "path", qstring_from_str(path));
> +} else {
> +qdict_put(opts, "host", qstring_from_str(host));
> +qdict_put(opts, "port", qstring_from_str(port));
> +}
>  }
>  if (export) {
>  qdict_put(opts, "export", qstring_from_str(export));
> 

At this point, I'll reserve giving R-b until I've seen the whole series
(it may need rebasing anyways...)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 06/14] block/nbd: Add nbd_has_filename_options_conflict()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Right now, we have four possible options that conflict with specifying
> an NBD filename, and a future patch will add another one ("address").
> This future option is a nested QDict that is flattened at this point,
> requiring as to test each option whether its key has an "address."
> prefix. Therefore, we will then need to iterate through all options.
> 
> Adding this iteration logic now will simplify adding the new option
> later. A nice side effect is that the user will not receive a long list
> of five options which are not supposed to be specified with a filename,
> but we can actually print the problematic option.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 26 --
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index d12bcc6..1736f68 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -120,6 +120,25 @@ out:
>  return ret;
>  }
>  
> +static bool nbd_has_filename_options_conflict(QDict *options, Error **errp)
> +{
> +const QDictEntry *e;
> +
> +for (e = qdict_first(options); e; e = qdict_next(options, e)) {
> +if (!strcmp(e->key, "host")
> +|| !strcmp(e->key, "port")

I know there are already instances of breaking before || in this file,
but most of qemu breaks after, as in:

if (!strcmp(e->key, "host") ||
!strcmp(e->key, "port") ||
...


But choice of formatting is trivial, so:
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> 4th argument is flags rather than size. Fortunately flags occupies
> 5 less significant bits and they are always zero due to alignment.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Duplicate of this patch, already on Kevin's block tree:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03377.html

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 for-2.7 00/14] qapi: Allow blockdev-add for NBD

2016-06-14 Thread Eric Blake

On 05/03/2016 09:23 AM, Kevin Wolf wrote:
> Am 06.04.2016 um 20:28 hat Max Reitz geschrieben:
>> Turns out NBD is not so simple to do if you do it right. Anyway, this
>> series adds blockdev-add support for NBD clients.
> 
> What the series does seems to make sense to me, though things would be a
> bit nicer (especially on the command line) if SocketAddress had been a
> flat union from the beginning.  It's too late to change now, and I guess
> the ugliness isn't worth duplicating it.

I'm sorely tempted to make SocketAddress a flat union anyways (or even
an alternate that can be flat or nested at will), if only for
convenience.  But that's a pipe dream - there's probably not enough time
before 2.7 to actually achieve it.

> 
> I'll leave the in-detail review to our NBD folks.
> 
> Kevin
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 05/14] block/nbd: Use qdict_put()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Instead of inlining this nice macro (i.e. resorting to
> qdict_put_obj(..., QOBJECT(...))), use it.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 04/14] block/nbd: Default port in nbd_refresh_filename()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Instead of not emitting the port in nbd_refresh_filename(), just set it
> to the default if the user did not specify it. This makes the logic a
> bit simpler.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2112ec0..efa5d3d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -433,6 +433,10 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
> QDict *options)
>  const char *export = qdict_get_try_str(options, "export");
>  const char *tlscreds = qdict_get_try_str(options, "tls-creds");
>  
> +if (host && !port) {
> +port = stringify(NBD_DEFAULT_PORT);
> +}

It would be nice to store the port as a number rather than a string -
but that's not your task (I've long thought that port should be a QAPI
alternate type, with both string and number branches, but it's a big
audit and code change to actually make it happen).

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Paolo Bonzini



On 14/06/2016 17:59, Alex Bligh wrote:
> 
>> On 14 Jun 2016, at 16:11, Paolo Bonzini  wrote:
>>
>>> To illustrate the problem, look consider what qemu itself would do as
>>> a server if it can't buffer the entire read issued to it.
>>
>> Return ENOMEM?
> 
> Well OK, qemu then 'works' on the basis it breaks another
> part of the spec, which is coping with long reads.

ENOMEM is a documented error code, and the limits extension will help
with that as well.

>> However, it looks like the
>> de facto status prior to structured replies is that the error is in the
>> spec, and this patch introduces a regression.
> 
> Well, I guess the patch makes it work the same as the
> reference server implementation and the spec, which I'd
> consider a fix. My view is that the error is in the
> kernel client.

... and QEMU and BSD.  What good is a server that doesn't interoperate
(albeit only in error cases) with any client?

Paolo

Re: [Qemu-block] [PATCH v3 01/14] qdict: Add qdict_change_key()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> This is a shorthand function for changing a QDict's entry's key.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/qapi/qmp/qdict.h |  1 +
>  qobject/qdict.c  | 23 +++
>  2 files changed, 24 insertions(+)

Reviewed-by: Eric Blake 

However, it would be nice to enhance tests/check-qdict.c to cover this.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 02/14] block/nbd: Drop trailing "." in error messages

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c   | 4 ++--
>  tests/qemu-iotests/051.out| 4 ++--
>  tests/qemu-iotests/051.pc.out | 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 

Could go in via qemu-trivial, if desired.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/7] doc: move text describing --trace to specific .texi file

2016-06-14 Thread Eric Blake

On 06/14/2016 03:49 PM, Eric Blake wrote:
> On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
>> This text will be included to qemu-nbd/qemu-img mans in the next patches.
>>
>> Signed-off-by: Denis V. Lunev 
>> CC: Eric Blake 
>> CC: Paolo Bonzini 
>> CC: Stefan Hajnoczi 
>> CC: Kevin Wolf 
>> ---
>>  Makefile   |  3 ++-
>>  qemu-option-trace.texi | 25 +
>>  qemu-options.hx| 27 +--
>>  3 files changed, 28 insertions(+), 27 deletions(-)
>>  create mode 100644 qemu-option-trace.texi
> 
>> +++ b/qemu-options.hx
>> @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options 
>> are backend-specific but
>>  HXCOMM HX does not support conditional compilation of text.
>>  @item -trace [events=@var{file}][,file=@var{file}]
>>  @findex -trace
> 
> As long as you are touching here, change the -trace line to match --help
> output:
> 
> -trace [[enable=]][,events=][,file=]

Nevermind - I see you do that in 2/7.

> 
> With that change:
> 
> Reviewed-by: Eric Blake 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 5/7] trace: enable tracing in qemu-nbd

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> Please note, trace_init_backends() must be called in the final process,
> i.e. after daemonization. This is necessary to keep tracing thread in
> the proper process.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile  |  2 +-
>  qemu-nbd.c| 18 +-
>  qemu-nbd.texi |  3 +++
>  3 files changed, 21 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 7/7] trace: enable tracing in qemu-img

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> The command will work this way:
> qemu-img --trace qcow2* create -f qcow2 1.img 64G
> 
> Signed-off-by: Denis V. Lunev 
> Suggested by: Daniel P. Berrange 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile  |  2 +-
>  qemu-img.c| 18 +-
>  qemu-img.texi |  3 +++
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/7] doc: move text describing --trace to specific .texi file

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> This text will be included to qemu-nbd/qemu-img mans in the next patches.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile   |  3 ++-
>  qemu-option-trace.texi | 25 +
>  qemu-options.hx| 27 +--
>  3 files changed, 28 insertions(+), 27 deletions(-)
>  create mode 100644 qemu-option-trace.texi

> +++ b/qemu-options.hx
> @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options 
> are backend-specific but
>  HXCOMM HX does not support conditional compilation of text.
>  @item -trace [events=@var{file}][,file=@var{file}]
>  @findex -trace

As long as you are touching here, change the -trace line to match --help
output:

-trace [[enable=]][,events=][,file=]

With that change:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v2 17/17] block: Move request_alignment into BlockLimit

2016-06-14 Thread Eric Blake

It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Signed-off-by: Eric Blake 

---
v2: drop hacks for save/restore of alignment, now that earlier
patches handled setting up BlockLimits more sanely
---
 include/block/block_int.h | 22 +-
 block.c   |  2 +-
 block/blkdebug.c  |  4 ++--
 block/bochs.c |  2 +-
 block/cloop.c |  2 +-
 block/dmg.c   |  2 +-
 block/io.c| 12 ++--
 block/iscsi.c |  2 +-
 block/qcow2.c |  2 +-
 block/raw-posix.c | 16 
 block/raw-win32.c |  6 +++---
 block/vvfat.c |  2 +-
 12 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 88ef826..77f47d9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,6 +324,12 @@ struct BlockDriver {
 };

 struct BlockLimits {
+/* Alignment requirement, in bytes, for offset/length of I/O
+ * requests. Must be a power of 2 less than INT_MAX. A value of 0
+ * defaults to 1 for drivers with modern byte interfaces, and to
+ * 512 otherwise. */
+uint32_t request_alignment;
+
 /* maximum number of bytes that can be discarded at once (since it
  * is signed, it must be < 2G, if set), should be multiple of
  * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -332,8 +338,8 @@ struct BlockLimits {

 /* optimal alignment for discard requests in bytes, must be power
  * of 2, less than max_discard if that is set, and multiple of
- * bs->request_alignment. May be 0 if bs->request_alignment is
- * good enough */
+ * bl.request_alignment. May be 0 if bl.request_alignment is good
+ * enough */
 uint32_t pdiscard_alignment;

 /* maximum number of bytes that can zeroized at once (since it is
@@ -343,12 +349,12 @@ struct BlockLimits {

 /* optimal alignment for write zeroes requests in bytes, must be
  * power of 2, less than max_pwrite_zeroes if that is set, and
- * multiple of bs->request_alignment. May be 0 if
- * bs->request_alignment is good enough */
+ * multiple of bl.request_alignment. May be 0 if
+ * bl.request_alignment is good enough */
 uint32_t pwrite_zeroes_alignment;

 /* optimal transfer length in bytes (must be power of 2, and
- * multiple of bs->request_alignment), or 0 if no preferred size */
+ * multiple of bl.request_alignment), or 0 if no preferred size */
 uint32_t opt_transfer;

 /* maximal transfer length in bytes (need not be power of 2, but
@@ -356,10 +362,10 @@ struct BlockLimits {
  * For now, anything larger than INT_MAX is clamped down. */
 uint32_t max_transfer;

-/* memory alignment so that no bounce buffer is needed */
+/* memory alignment, in bytes so that no bounce buffer is needed */
 size_t min_mem_alignment;

-/* memory alignment for bounce buffer */
+/* memory alignment, in bytes, for bounce buffer */
 size_t opt_mem_alignment;

 /* maximum number of iovec elements */
@@ -463,8 +469,6 @@ struct BlockDriverState {
 /* I/O Limits */
 BlockLimits bl;

-/* Alignment requirement for offset/length of I/O requests */
-unsigned int request_alignment;
 /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
 unsigned int supported_write_flags;
 /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
diff --git a/block.c b/block.c
index 61fe1df..d578df8 100644
--- a/block.c
+++ b/block.c
@@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BdrvChild *file,

 assert(bdrv_opt_mem_align(bs) != 0);
 assert(bdrv_min_mem_align(bs) != 0);
-assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs));
+assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs));

 qemu_opts_del(opts);
 return 0;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1589fa7..5e32643 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -384,7 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,

 /* Set request alignment */
 align = qemu_opt_get_size(opts, "align",
-  bs->request_alignment ?: BDRV_SECTOR_SIZE);
+  bs->bl.request_alignment ?: BDRV_SECTOR_SIZE);
 if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
 s->align = align;
 } else {
@@ -727,7 +727,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, 
Error **errp)
 BDRVBlkdebugState *s = bs->opaque;

 if (s->align) {
-bs->request_alignment = s->align;
+bs->bl.request_alignment = s->align;
 }
 }

diff --git a/block/bochs.c b/block/bochs.c
index 182c50b..4194f1d 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -190,7 +190,7 @@ fail:

 static void

[Qemu-block] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based

2016-06-14 Thread Eric Blake

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

Signed-off-by: Eric Blake 

---
v2: Hoist unrelated cleanups earlier, use name 'max_transfer' in more
places, tweak iscsi calculations
---
 include/block/block_int.h  | 11 +++
 include/sysemu/block-backend.h |  2 +-
 block/block-backend.c  | 10 +-
 block/io.c | 23 +++
 block/iscsi.c  | 20 
 block/nbd.c|  2 +-
 block/raw-posix.c  |  3 ++-
 hw/block/virtio-blk.c  |  9 +
 hw/scsi/scsi-generic.c | 11 ++-
 qemu-img.c |  8 
 10 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 16c43e2..2b45d57 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -338,11 +338,14 @@ typedef struct BlockLimits {
  * power of 2, and less than max_pwrite_zeroes if that is set */
 uint32_t pwrite_zeroes_alignment;

-/* optimal transfer length in sectors */
-int opt_transfer_length;
+/* optimal transfer length in bytes (must be power of 2, and
+ * multiple of bs->request_alignment), or 0 if no preferred size */
+uint32_t opt_transfer;

-/* maximal transfer length in sectors */
-int max_transfer_length;
+/* maximal transfer length in bytes (need not be power of 2, but
+ * should be multiple of opt_transfer), or 0 for no 32-bit limit.
+ * For now, anything larger than INT_MAX is clamped down. */
+uint32_t max_transfer;

 /* memory alignment so that no bounce buffer is needed */
 size_t min_mem_alignment;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c04af8e..2469a1c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -170,7 +170,7 @@ bool blk_is_available(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
-int blk_get_max_transfer_length(BlockBackend *blk);
+uint32_t blk_get_max_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
diff --git a/block/block-backend.c b/block/block-backend.c
index 1fb070b..e042544 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1303,16 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
 }
 }

-/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
-int blk_get_max_transfer_length(BlockBackend *blk)
+/* Returns the maximum transfer length, in bytes; guaranteed nonzero */
+uint32_t blk_get_max_transfer(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
-int max = 0;
+uint32_t max = 0;

 if (bs) {
-max = bs->bl.max_transfer_length;
+max = bs->bl.max_transfer;
 }
-return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
+return MIN_NON_ZERO(max, INT_MAX);
 }

 int blk_get_max_iov(BlockBackend *blk)
diff --git a/block/io.c b/block/io.c
index c425ce8..5e38ab4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -88,8 +88,8 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer_length = bs->file->bs->bl.opt_transfer_length;
-bs->bl.max_transfer_length = bs->file->bs->bl.max_transfer_length;
+bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
+bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
 bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
 bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
 bs->bl.max_iov = bs->file->bs->bl.max_iov;
@@ -107,12 +107,10 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer_length =
-MAX(bs->bl.opt_transfer_length,
-bs->backing->bs->bl.opt_transfer_length);
-bs->bl.max_transfer_length =
-MIN_NON_ZERO(bs->bl.max_transfer_length,
- bs->backing->bs->bl.max_transfer_length);
+bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
+  bs->backing->bs->bl.opt_transfer);
+bs->bl.max_transfer =

[Qemu-block] [PATCH v2 15/17] block: Switch discard length bounds to byte-based

2016-06-14 Thread Eric Blake

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
is no longer needed; and the BlockLimits type is now completely
byte-based.

Signed-off-by: Eric Blake 

---
v2: rebase nbd and iscsi limits across earlier improvements
---
 include/block/block_int.h | 21 +++--
 block/io.c| 16 +---
 block/iscsi.c | 19 ++-
 block/nbd.c   |  2 +-
 qemu-img.c|  3 ++-
 5 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2b45d57..0169019 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,18 +324,27 @@ struct BlockDriver {
 };

 typedef struct BlockLimits {
-/* maximum number of sectors that can be discarded at once */
-int max_discard;
+/* maximum number of bytes that can be discarded at once (since it
+ * is signed, it must be < 2G, if set), should be multiple of
+ * pdiscard_alignment, but need not be power of 2. May be 0 if no
+ * inherent 32-bit limit */
+int32_t max_pdiscard;

-/* optimal alignment for discard requests in sectors */
-int64_t discard_alignment;
+/* optimal alignment for discard requests in bytes, must be power
+ * of 2, less than max_discard if that is set, and multiple of
+ * bs->request_alignment. May be 0 if bs->request_alignment is
+ * good enough */
+uint32_t pdiscard_alignment;

 /* maximum number of bytes that can zeroized at once (since it is
- * signed, it must be < 2G, if set) */
+ * signed, it must be < 2G, if set), should be multiple of
+ * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
 int32_t max_pwrite_zeroes;

 /* optimal alignment for write zeroes requests in bytes, must be
- * power of 2, and less than max_pwrite_zeroes if that is set */
+ * power of 2, less than max_pwrite_zeroes if that is set, and
+ * multiple of bs->request_alignment. May be 0 if
+ * bs->request_alignment is good enough */
 uint32_t pwrite_zeroes_alignment;

 /* optimal transfer length in bytes (must be power of 2, and
diff --git a/block/io.c b/block/io.c
index 5e38ab4..0b5c40d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
   BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);

-max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
+max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS,
+   BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors > 0) {
 int ret;
 int num = nb_sectors;
+int discard_alignment = bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS;

 /* align request */
-if (bs->bl.discard_alignment &&
-num >= bs->bl.discard_alignment &&
-sector_num % bs->bl.discard_alignment) {
-if (num > bs->bl.discard_alignment) {
-num = bs->bl.discard_alignment;
+if (discard_alignment &&
+num >= discard_alignment &&
+sector_num % discard_alignment) {
+if (num > discard_alignment) {
+num = discard_alignment;
 }
-num -= sector_num % bs->bl.discard_alignment;
+num -= sector_num % discard_alignment;
 }

 /* limit request size */
diff --git a/block/iscsi.c b/block/iscsi.c
index 739d5ca..af9babf 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs)
 memset(iscsilun, 0, sizeof(IscsiLun));
 }

-static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
-{
-int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
-
-return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
-}
-
 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 /* We don't actually refresh here, but just return data queried in
@@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }

 if (iscsilun->lbp.lbpu) {
-if (iscsilun->bl.max_unmap < 0x) {
-bs->bl.max_discard =
-sector_limits_lun2qemu(iscsilun->bl.max_unmap, iscsilun);
+if (iscsilun->bl.max_unmap < 0x / iscsilun->block_size) {
+bs->bl.max_pdiscard =
+iscsilun->bl.max_unmap * iscsilun->block_size;
 }
-bs->bl.discard_alignment =
-

[Qemu-block] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of gthe limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing dhain during the main bdrv_refresh_limits().  Also,
the next patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Solve the issue by factoring out a new bdrv_merge_limits(),
and using that function to properly merge limits when comparing
two BlockDriverState objects.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 include/block/block.h |  1 +
 include/block/block_int.h |  4 ++--
 include/qemu/typedefs.h   |  1 +
 block/io.c| 31 +--
 block/raw_bsd.c   |  4 ++--
 5 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 733a8ec..c1d4648 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -262,6 +262,7 @@ int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
 int bdrv_change_backing_file(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0169019..88ef826 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -323,7 +323,7 @@ struct BlockDriver {
 QLIST_ENTRY(BlockDriver) list;
 };

-typedef struct BlockLimits {
+struct BlockLimits {
 /* maximum number of bytes that can be discarded at once (since it
  * is signed, it must be < 2G, if set), should be multiple of
  * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -364,7 +364,7 @@ typedef struct BlockLimits {

 /* maximum number of iovec elements */
 int max_iov;
-} BlockLimits;
+};

 typedef struct BdrvOpBlocker BdrvOpBlocker;

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index b113fcf..e6f72c2 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 typedef struct BlockBackend BlockBackend;
 typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
+typedef struct BlockLimits BlockLimits;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
 typedef struct CharDriverState CharDriverState;
diff --git a/block/io.c b/block/io.c
index 0b5c40d..c6c1f7b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -67,6 +67,17 @@ static void bdrv_parent_drained_end(BlockDriverState *bs)
 }
 }

+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
+{
+dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
+dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
+ src->opt_mem_alignment);
+dst->min_mem_alignment = MAX(dst->min_mem_alignment,
+ src->min_mem_alignment);
+dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
+}
+
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs->drv;
@@ -88,11 +99,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
-bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
-bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
-bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
-bs->bl.max_iov = bs->file->bs->bl.max_iov;
+bdrv_merge_limits(>bl, >file->bs->bl);
 } else {
 bs->bl.min_mem_alignment = 512;
 bs->bl.opt_mem_alignment = getpagesize();
@@ -107,19 +114,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
-  bs->backing->bs->bl.opt_transfer);
-bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
-   bs->backing->bs->bl.max_transfer);
-bs->bl.opt_mem_alignment =
-MAX(bs->bl.opt_mem_alignment,
-bs->backing->bs->bl.opt_mem_alignment);
-bs->bl.min_mem_alignment =
-MAX(bs->bl.min_mem_alignment,
-bs->backing->bs->bl.min_mem_alignment);
-bs->bl.max_iov =
-MIN(bs->bl.max_iov,
-

[Qemu-block] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

A later patch will be altering when bs->request_alignment
defaults are set, so fall back to sector alignment if we have
not inherited anything yet.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/blkdebug.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 20d25bd..1589fa7 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -37,6 +37,7 @@
 typedef struct BDRVBlkdebugState {
 int state;
 int new_state;
+int align;

 QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
 QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
@@ -382,9 +383,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 }

 /* Set request alignment */
-align = qemu_opt_get_size(opts, "align", bs->request_alignment);
-if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
-bs->request_alignment = align;
+align = qemu_opt_get_size(opts, "align",
+  bs->request_alignment ?: BDRV_SECTOR_SIZE);
+if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
+s->align = align;
 } else {
 error_setg(errp, "Invalid alignment");
 ret = -EINVAL;
@@ -720,6 +722,15 @@ static void blkdebug_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }

+static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVBlkdebugState *s = bs->opaque;
+
+if (s->align) {
+bs->request_alignment = s->align;
+}
+}
+
 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
 {
@@ -738,6 +749,7 @@ static BlockDriver bdrv_blkdebug = {
 .bdrv_getlength = blkdebug_getlength,
 .bdrv_truncate  = blkdebug_truncate,
 .bdrv_refresh_filename  = blkdebug_refresh_filename,
+.bdrv_refresh_limits= blkdebug_refresh_limits,

 .bdrv_aio_readv = blkdebug_aio_readv,
 .bdrv_aio_writev= blkdebug_aio_writev,
-- 
2.5.5

[Qemu-block] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake 
---
 block.c| 1 -
 block/io.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index b350794..61fe1df 100644
--- a/block.c
+++ b/block.c
@@ -937,7 +937,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }

-bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);

 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
diff --git a/block/io.c b/block/io.c
index 383154f..c425ce8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -78,6 +78,9 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 return;
 }

+/* Default alignment based on whether driver has byte interface */
+bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
+
 /* Take some limits from the children as a default */
 if (bs->file) {
 bdrv_refresh_limits(bs->file->bs, _err);
-- 
2.5.5

[Qemu-block] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()

2016-06-14 Thread Eric Blake

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake 
Reviewed-by: Kevin Wolf 

---
v2: no change, add R-b
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 9191772..383154f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1024,7 +1024,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }

 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
-if (bytes < max_bytes) {
+if (bytes <= max_bytes) {
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else if (max_bytes > 0) {
 QEMUIOVector local_qiov;
-- 
2.5.5

[Qemu-block] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/bochs.c | 7 ++-
 block/cloop.c | 7 ++-
 block/dmg.c   | 7 ++-
 block/vvfat.c | 7 ++-
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index 6c8d0f3..182c50b 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -105,7 +105,6 @@ static int bochs_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1; // no write support yet
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 ret = bdrv_pread(bs->file->bs, 0, , sizeof(bochs));
 if (ret < 0) {
@@ -189,6 +188,11 @@ fail:
 return ret;
 }

+static void bochs_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
 {
 BDRVBochsState *s = bs->opaque;
@@ -283,6 +287,7 @@ static BlockDriver bdrv_bochs = {
 .instance_size = sizeof(BDRVBochsState),
 .bdrv_probe= bochs_probe,
 .bdrv_open = bochs_open,
+.bdrv_refresh_limits = bochs_refresh_limits,
 .bdrv_co_preadv = bochs_co_preadv,
 .bdrv_close= bochs_close,
 };
diff --git a/block/cloop.c b/block/cloop.c
index ea5a92b..d574003 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -67,7 +67,6 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1;
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 /* read header */
 ret = bdrv_pread(bs->file->bs, 128, >block_size, 4);
@@ -199,6 +198,11 @@ fail:
 return ret;
 }

+static void cloop_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int cloop_read_block(BlockDriverState *bs, int block_num)
 {
 BDRVCloopState *s = bs->opaque;
@@ -280,6 +284,7 @@ static BlockDriver bdrv_cloop = {
 .instance_size  = sizeof(BDRVCloopState),
 .bdrv_probe = cloop_probe,
 .bdrv_open  = cloop_open,
+.bdrv_refresh_limits = cloop_refresh_limits,
 .bdrv_co_preadv = cloop_co_preadv,
 .bdrv_close = cloop_close,
 };
diff --git a/block/dmg.c b/block/dmg.c
index 06eb513..1e53cd8 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -439,7 +439,6 @@ static int dmg_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1;
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 s->n_chunks = 0;
 s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
@@ -547,6 +546,11 @@ fail:
 return ret;
 }

+static void dmg_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int is_sector_in_chunk(BDRVDMGState* s,
 uint32_t chunk_num, uint64_t sector_num)
 {
@@ -720,6 +724,7 @@ static BlockDriver bdrv_dmg = {
 .instance_size  = sizeof(BDRVDMGState),
 .bdrv_probe = dmg_probe,
 .bdrv_open  = dmg_open,
+.bdrv_refresh_limits = dmg_refresh_limits,
 .bdrv_co_preadv = dmg_co_preadv,
 .bdrv_close = dmg_close,
 };
diff --git a/block/vvfat.c b/block/vvfat.c
index 6d2e21c..08b1aa3 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1180,7 +1180,6 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->read_only = 0;
 }

-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
 bs->total_sectors = cyls * heads * secs;

 if (init_directories(s, dirname, heads, secs, errp)) {
@@ -1212,6 +1211,11 @@ fail:
 return ret;
 }

+static void vvfat_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline void vvfat_close_current_file(BDRVVVFATState *s)
 {
 if(s->current_mapping) {
@@ -3049,6 +3053,7 @@ static BlockDriver bdrv_vvfat = {

 .bdrv_parse_filename= vvfat_parse_filename,
 .bdrv_file_open = vvfat_open,
+.bdrv_refresh_limits= vvfat_refresh_limits,
 .bdrv_close = vvfat_close,

 .bdrv_co_preadv = vvfat_co_preadv,
-- 
2.5.5

[Qemu-block] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/qcow2.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c40baca..393afa9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -975,9 +975,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 }

 bs->encrypted = 1;
-
-/* Encryption works on a sector granularity */
-bs->request_alignment = BDRV_SECTOR_SIZE;
 }

 s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
@@ -1196,6 +1193,10 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;

+if (bs->encrypted) {
+/* Encryption works on a sector granularity */
+bs->request_alignment = BDRV_SECTOR_SIZE;
+}
 bs->bl.pwrite_zeroes_alignment = s->cluster_size;
 }

-- 
2.5.5

[Qemu-block] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/raw-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/raw-win32.c b/block/raw-win32.c
index fd23891..88382d9 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -222,7 +222,7 @@ static void raw_attach_aio_context(BlockDriverState *bs,
 }
 }

-static void raw_probe_alignment(BlockDriverState *bs)
+static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 DWORD sectorsPerCluster, freeClusters, totalClusters, count;
@@ -365,7 +365,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs));
 }

-raw_probe_alignment(bs);
 ret = 0;
 fail:
 qemu_opts_del(opts);
@@ -550,6 +549,7 @@ BlockDriver bdrv_file = {
 .bdrv_needs_filename = true,
 .bdrv_parse_filename = raw_parse_filename,
 .bdrv_file_open = raw_open,
+.bdrv_refresh_limits = raw_probe_alignment,
 .bdrv_close = raw_close,
 .bdrv_create= raw_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-- 
2.5.5

[Qemu-block] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer

2016-06-14 Thread Eric Blake

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x4000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/iscsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 7e78ade..4290e41 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1697,7 +1697,9 @@ static void iscsi_close(BlockDriverState *bs)

 static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
 {
-return MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+
+return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
 }

 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
-- 
2.5.5

[Qemu-block] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()

2016-06-14 Thread Eric Blake

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/io.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 42e63f7..056a101 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1242,7 +1242,7 @@ fail:
  */
 static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
-QEMUIOVector *qiov, int flags)
+int64_t align, QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs->drv;
 bool waited;
@@ -1251,6 +1251,9 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 int64_t start_sector = offset >> BDRV_SECTOR_BITS;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

+assert(is_power_of_2(align));
+assert((offset & (align - 1)) == 0);
+assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -1337,7 +1340,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,

 memset(buf + head_padding_bytes, 0, zero_bytes);
 ret = bdrv_aligned_pwritev(bs, req, offset & ~(align - 1), align,
-   _qiov,
+   align, _qiov,
flags & ~BDRV_REQ_ZERO_WRITE);
 if (ret < 0) {
 goto fail;
@@ -1350,7 +1353,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 if (bytes >= align) {
 /* Write the aligned part in the middle. */
 uint64_t aligned_bytes = bytes & ~(align - 1);
-ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes,
+ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, align,
NULL, flags);
 if (ret < 0) {
 goto fail;
@@ -1374,7 +1377,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_AFTER_TAIL);

 memset(buf, 0, bytes);
-ret = bdrv_aligned_pwritev(bs, req, offset, align,
+ret = bdrv_aligned_pwritev(bs, req, offset, align, align,
_qiov, flags & ~BDRV_REQ_ZERO_WRITE);
 }
 fail:
@@ -1499,7 +1502,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }

-ret = bdrv_aligned_pwritev(bs, , offset, bytes,
+ret = bdrv_aligned_pwritev(bs, , offset, bytes, align,
use_local_qiov ? _qiov : qiov,
flags);

-- 
2.5.5

[Qemu-block] [PATCH v2 04/17] nbd: Allow larger requests

2016-06-14 Thread Eric Blake

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 include/block/nbd.h | 1 +
 block/nbd-client.c  | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..36dde24 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,7 @@ enum {

 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
+#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE)

 ssize_t nbd_wr_syncv(QIOChannel *ioc,
  struct iovec *iov,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 4d13444..420bce8 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
sector_num,
 return -reply.error;
 }

-/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
- * remain aligned to 4K. */
-#define NBD_MAX_SECTORS 2040
-
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
2.5.5

[Qemu-block] [PATCH v2 00/17] Byte-based block limits

2016-06-14 Thread Eric Blake

BlockLimits is currently an ugly mix of byte limits vs.
sector limits.  Unify it.  Fix some bugs I found in
bdrv_aligned_preadv() while at it.

Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv()
to be byte-based (commit 3de06b2 on his vmstate branch at the
time of this email, but that gets rebased):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html

Trivial contextual conflict in nbd.h with the pull request Paolo
will soon be posting (both series add a #define near the same
line; resolution is to add both):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg0.html

Also available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2

Since v1:
- drop things already done in Kevin's work
- rebase
- split out lots of cleanup work to bdrv_refresh_limits() so
that qemu-iotests does not gain any problems on 77

001/17:[down] 'block: Tighter assertions on bdrv_aligned_pwritev()'
002/17:[down] 'block: Document supported flags during bdrv_aligned_preadv()'
003/17:[down] 'block: Fix harmless off-by-one in bdrv_aligned_preadv()'
004/17:[down] 'nbd: Allow larger requests'
005/17:[down] 'nbd: Advertise realistic limits to block layer'
006/17:[down] 'iscsi: Advertise realistic limits to block layer'
007/17:[down] 'block: Give nonzero result to blk_get_max_transfer_length()'
008/17:[down] 'blkdebug: Set request_alignment during .bdrv_refresh_limits()'
009/17:[down] 'iscsi: Set request_alignment during .bdrv_refresh_limits()'
010/17:[down] 'qcow2: Set request_alignment during .bdrv_refresh_limits()'
011/17:[down] 'raw-win32: Set request_alignment during .bdrv_refresh_limits()'
012/17:[down] 'block: Set request_alignment during .bdrv_refresh_limits()'
013/17:[down] 'block: Set default request_alignment during 
bdrv_refresh_limits()'
014/17:[0061] [FC] 'block: Switch transfer length bounds to byte-based'
015/17:[0008] [FC] 'block: Switch discard length bounds to byte-based'
016/17:[down] 'block: Split bdrv_merge_limits() from bdrv_refresh_limits()'
017/17:[0044] [FC] 'block: Move request_alignment into BlockLimit'

Eric Blake (17):
  block: Tighter assertions on bdrv_aligned_pwritev()
  block: Document supported flags during bdrv_aligned_preadv()
  block: Fix harmless off-by-one in bdrv_aligned_preadv()
  nbd: Allow larger requests
  nbd: Advertise realistic limits to block layer
  iscsi: Advertise realistic limits to block layer
  block: Give nonzero result to blk_get_max_transfer_length()
  blkdebug: Set request_alignment during .bdrv_refresh_limits()
  iscsi: Set request_alignment during .bdrv_refresh_limits()
  qcow2: Set request_alignment during .bdrv_refresh_limits()
  raw-win32: Set request_alignment during .bdrv_refresh_limits()
  block: Set request_alignment during .bdrv_refresh_limits()
  block: Set default request_alignment during bdrv_refresh_limits()
  block: Switch transfer length bounds to byte-based
  block: Switch discard length bounds to byte-based
  block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  block: Move request_alignment into BlockLimit

 include/block/block.h  |  1 +
 include/block/block_int.h  | 48 ++---
 include/block/nbd.h|  1 +
 include/qemu/typedefs.h|  1 +
 include/sysemu/block-backend.h |  2 +-
 block.c|  3 +-
 block/blkdebug.c   | 18 ++--
 block/block-backend.c  |  9 ++--
 block/bochs.c  |  7 ++-
 block/cloop.c  |  7 ++-
 block/dmg.c|  7 ++-
 block/io.c | 96 +++---
 block/iscsi.c  | 40 +-
 block/nbd-client.c |  4 --
 block/nbd.c|  4 +-
 block/qcow2.c  |  7 +--
 block/raw-posix.c  | 19 +
 block/raw-win32.c  | 10 ++---
 block/raw_bsd.c|  4 +-
 block/vvfat.c  |  7 ++-
 hw/block/virtio-blk.c  | 10 ++---
 hw/scsi/scsi-generic.c | 15 ---
 qemu-img.c |  9 ++--
 23 files changed, 195 insertions(+), 134 deletions(-)

-- 
2.5.5

[Qemu-block] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()

2016-06-14 Thread Eric Blake

We don't pass any flags on to drivers to handle.  Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake 

---
v2: retitle from 'Honor flags during bdrv_aligned_preadv()', and
change scope of patch
---
 block/io.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 056a101..9191772 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,6 +933,9 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,

 if (drv->bdrv_co_pwrite_zeroes &&
 buffer_is_zero(bounce_buffer, iov.iov_len)) {
+/* FIXME: Should we (perhaps conditionally) be setting
+ * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
+ * that still correctly reads as zero? */
 ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
 } else {
 /* This does not change the data on the disk, it is not necessary
@@ -975,7 +978,12 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
-assert(!(flags & ~BDRV_REQ_MASK));
+
+/* TODO: We would need a per-BDS .supported_read_flags and
+ * potential fallback support, if we ever implement any read flags
+ * to pass through to drivers.  For now, there aren't any
+ * passthrough flags.  */
+assert(!(flags & ~(BDRV_REQ_NO_SERIALISING | BDRV_REQ_COPY_ON_READ)));

 /* Handle Copy on Read and associated serialisation */
 if (flags & BDRV_REQ_COPY_ON_READ) {
-- 
2.5.5

Re: [Qemu-block] [Qemu-devel] [PATCH] macio: Use blk_drain instead of blk_drain_all

2016-06-14 Thread John Snow



On 06/13/2016 09:21 PM, Fam Zheng wrote:
> On Mon, 06/13 17:33, John Snow wrote:
>>
>>
>> On 06/12/2016 02:56 AM, Fam Zheng wrote:
>>> We only care about the associated backend, so blk_drain is more
>>> appropriate here.
>>>
>>> Signed-off-by: Fam Zheng 
>>> ---
>>>  hw/ide/macio.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>>> index 78c10a0..a8c7321 100644
>>> --- a/hw/ide/macio.c
>>> +++ b/hw/ide/macio.c
>>> @@ -400,7 +400,7 @@ static void pmac_ide_flush(DBDMA_io *io)
>>>  IDEState *s = idebus_active_if(>bus);
>>>  
>>>  if (s->bus->dma->aiocb) {
>>> -blk_drain_all();
>>> +blk_drain(s->blk);
>>>  }
>>>  }
>>>  
>>>
>>
>> Reviewed-by: John Snow 
>>
>> Shall I take this through my tree?
> 
> I think that MAINTAINERS says so. :)
> 
> Fam
> 

For now, then:

Thanks, applied to my IDE tree:

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Eric Blake

On 06/13/2016 06:19 AM, Paolo Bonzini wrote:
> 
> 
> On 12/05/2016 00:39, Eric Blake wrote:
>> We have a few bugs in how we handle invalid client commands:
>>
>> - A client can send an NBD_CMD_DISC where from + len overflows,
>> convincing us to reply with an error and stay connected, even
>> though the protocol requires us to silently disconnect. Fix by
>> hoisting the special case sooner.
>>

> It's simpler to always set req->complete.  Putting everything together:
> 
> diff --git a/nbd/server.c b/nbd/server.c

> @@ -1213,12 +1218,9 @@ static void nbd_trip(void *opaque)
>  LOG("invalid request type (%" PRIu32 ") received", request.type);
>  reply.error = EINVAL;
>  error_reply:
> -/* We must disconnect after replying with an error to
> - * NBD_CMD_READ, since we choose not to send bogus filler
> - * data; likewise after NBD_CMD_WRITE if we did not read the
> - * payload. */
> -if (nbd_co_send_reply(req, , 0) < 0 || command == NBD_CMD_READ 
> ||
> -(command == NBD_CMD_WRITE && !req->complete)) {
> +/* We must disconnect after NBD_CMD_WRITE if we did not
> + * read the payload. */
> +if (nbd_co_send_reply(req, , 0) < 0 || !req->complete)) {

This doesn't even compile (too many ')').  I assume you'll fix that
before your actual pull request goes out.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/3] coccinelle: Clean up error checks and return value variables

2016-06-14 Thread Eduardo Habkost

On Tue, Jun 14, 2016 at 11:03:20AM +0200, Markus Armbruster wrote:
[...]
> > * Manual fixups
> 
> With the commit message of 3/3 amended, series
> Reviewed-by: Markus Armbruster 
> 
> My other suggested touch ups are optional.  If you don't object, I'll do
> them, and take the result through my error-next branch.

I'm OK with the changes you suggested. Thanks!

-- 
Eduardo

Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/3] coccinelle: Clean up error checks and return value variables

2016-06-14 Thread Markus Armbruster

Markus Armbruster  writes:

> Eduardo Habkost  writes:
>
>> Changes v1 -> v2:
>> * The Coccinelle scripts were simplified by using "when"
>>   constraints to detect when a variable is not used elsewhere
>>   inside the function.
>> * Added script to remove unnecessary variables for function
>>   return value.
>> * Coccinelle scripts added to scripts/coccinelle.
>>
>> Changes v2 -> v3 on patch 2/3:
>> * Remove unused metavariable from script
>>  * Do changes only if errp is not touched before the error_setg()
>>call (so we are sure *errp is not set and error_setg() won't
>>abort)
>>  * Changes dropped from v2:
>>* block.c:bdrv_create_co_entry()
>>* block.c:bdrv_create_file()
>>* blockdev.c:qmp_blockdev_mirror()
>>
>> Changes v2 -> v3 on patch 3/3:
>> * Not a RFC anymore
>> * Used --keep-comments option
>> * Instead of using:
>> - VAR = E;
>> + return E;
>>   use:
>> - VAR =
>> + return
>>   E
>>   This makes Coccinelle keep the existing formatting on some
>>   cases.
>
> Neat trick.
>
>> * Manual fixups
>
> With the commit message of 3/3 amended, series
> Reviewed-by: Markus Armbruster 
>
> My other suggested touch ups are optional.  If you don't object, I'll do
> them, and take the result through my error-next branch.

Applied to error-next, thanks!

Re: [Qemu-block] [PATCH v2] rbd:change error_setg() to error_setg_errno()

2016-06-14 Thread Vikhyat Umrao

On Tue, Jun 14, 2016 at 8:12 PM, Max Reitz  wrote:

> On 09.05.2016 09:51, Vikhyat Umrao wrote:
> > Ceph RBD block driver does not use error_setg_errno() where
> > it is possible to use. This patch replaces error_setg()
> > from error_setg_errno().
> >
> > Signed-off-by: Vikhyat Umrao 
> > ---
> >  block/rbd.c | 38 +++---
> >  1 file changed, 23 insertions(+), 15 deletions(-)
>
> Thanks, applied to my block tree:
>
> https://github.com/XanClic/qemu/commits/block
>
>
Thanks Max.


> Max
>
>

Re: [Qemu-block] [PATCH v3 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Max Reitz

On 14.06.2016 18:24, Daniel P. Berrange wrote:
> The qemu-img info command has the ability to expose format
> specific metadata about volumes. Wire up this facility for
> the LUKS driver to report on cipher configuration and key
> slot usage.
> 
> $ qemu-img info ~/VirtualMachines/demo.luks
> image: /home/berrange/VirtualMachines/demo.luks
> file format: luks
> virtual size: 98M (102760448 bytes)
> disk size: 100M
> encrypted: yes
> Format specific information:
> ivgen alg: plain64
> hash alg: sha1
> cipher alg: aes-128
> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
> cipher mode: xts
> slots:
> [0]:
> active: true
> iters: 572706
> key offset: 4096
> stripes: 4000
> [1]:
> active: false
> key offset: 135168
> [2]:
> active: false
> key offset: 266240
> [3]:
> active: false
> key offset: 397312
> [4]:
> active: false
> key offset: 528384
> [5]:
> active: false
> key offset: 659456
> [6]:
> active: false
> key offset: 790528
> [7]:
> active: false
> key offset: 921600
> payload offset: 2097152
> master key iters: 142375
> 
> One somewhat undesirable artifact is that the data fields are
> printed out in (apparently) random order. This will be addressed
> later by changing the way the block layer pretty-prints the
> image specific data.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c   | 49 +
>  qapi/block-core.json |  6 +-
>  2 files changed, 54 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v3 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Daniel P. Berrange

The qemu-img info command has the ability to expose format
specific metadata about volumes. Wire up this facility for
the LUKS driver to report on cipher configuration and key
slot usage.

$ qemu-img info ~/VirtualMachines/demo.luks
image: /home/berrange/VirtualMachines/demo.luks
file format: luks
virtual size: 98M (102760448 bytes)
disk size: 100M
encrypted: yes
Format specific information:
ivgen alg: plain64
hash alg: sha1
cipher alg: aes-128
uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
cipher mode: xts
slots:
[0]:
active: true
iters: 572706
key offset: 4096
stripes: 4000
[1]:
active: false
key offset: 135168
[2]:
active: false
key offset: 266240
[3]:
active: false
key offset: 397312
[4]:
active: false
key offset: 528384
[5]:
active: false
key offset: 659456
[6]:
active: false
key offset: 790528
[7]:
active: false
key offset: 921600
payload offset: 2097152
master key iters: 142375

One somewhat undesirable artifact is that the data fields are
printed out in (apparently) random order. This will be addressed
later by changing the way the block layer pretty-prints the
image specific data.

Signed-off-by: Daniel P. Berrange 
---
 block/crypto.c   | 49 +
 qapi/block-core.json |  6 +-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 758e14e..6fe2521 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -565,6 +565,53 @@ static int block_crypto_create_luks(const char *filename,
filename, opts, errp);
 }
 
+static int block_crypto_get_info_luks(BlockDriverState *bs,
+  BlockDriverInfo *bdi)
+{
+BlockDriverInfo subbdi;
+int ret;
+
+ret = bdrv_get_info(bs->file->bs, );
+if (ret != 0) {
+return ret;
+}
+
+bdi->unallocated_blocks_are_zero = false;
+bdi->can_write_zeroes_with_unmap = false;
+bdi->cluster_size = subbdi.cluster_size;
+
+return 0;
+}
+
+static ImageInfoSpecific *
+block_crypto_get_specific_info_luks(BlockDriverState *bs)
+{
+BlockCrypto *crypto = bs->opaque;
+ImageInfoSpecific *spec_info;
+QCryptoBlockInfo *info;
+
+info = qcrypto_block_get_info(crypto->block, NULL);
+if (!info) {
+return NULL;
+}
+if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
+qapi_free_QCryptoBlockInfo(info);
+return NULL;
+}
+
+spec_info = g_new(ImageInfoSpecific, 1);
+spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS;
+spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
+*spec_info->u.luks.data = info->u.luks;
+
+/* Blank out pointers we've just stolen to avoid double free */
+memset(>u.luks, 0, sizeof(info->u.luks));
+
+qapi_free_QCryptoBlockInfo(info);
+
+return spec_info;
+}
+
 BlockDriver bdrv_crypto_luks = {
 .format_name= "luks",
 .instance_size  = sizeof(BlockCrypto),
@@ -578,6 +625,8 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_co_readv  = block_crypto_co_readv,
 .bdrv_co_writev = block_crypto_co_writev,
 .bdrv_getlength = block_crypto_getlength,
+.bdrv_get_info  = block_crypto_get_info_luks,
+.bdrv_get_specific_info = block_crypto_get_specific_info_luks,
 };
 
 static void block_crypto_init(void)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98a20d2..35454e9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -85,7 +85,11 @@
 { 'union': 'ImageInfoSpecific',
   'data': {
   'qcow2': 'ImageInfoSpecificQCow2',
-  'vmdk': 'ImageInfoSpecificVmdk'
+  'vmdk': 'ImageInfoSpecificVmdk',
+  # If we need to add block driver specific parameters for
+  # LUKS in future, then we'll subclass QCryptoBlockInfoLUKS
+  # to define a ImageInfoSpecificLUKS
+  'luks': 'QCryptoBlockInfoLUKS'
   } }
 
 ##
-- 
2.5.5

Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray

2016-06-14 Thread Max Reitz

On 14.06.2016 17:54, John Snow wrote:
> 
> 
> On 06/14/2016 09:19 AM, Max Reitz wrote:
>> On 10.06.2016 23:59, John Snow wrote:
>>> If a device still has an attached BDS because the medium has not yet
>>> been removed, we will be unable to migrate to a new host because
>>> blk_flush will return an error for that backend.
>>>
>>> Replace the call to blk_is_available to blk_is_inserted to weaken
>>> the check and allow flushes from the backend to work, while still
>>> disallowing flushes from the frontend/device model to work.
>>>
>>> This fixes a regression present in 2.6.0 caused by the following commit:
>>> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
>>> block: Move some bdrv_*_all() functions to BB
>>>
>>> Signed-off-by: John Snow 
>>> ---
>>>  block/block-backend.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I
>> guess you exclude them here because you specifically want to fix the
>> issue mentioned in the commit message, but then we could just make
>> blk_flush_all() ignore an -ENOMEDIUM.
> 
> Yeah, I didn't investigate the full path. Just making the minimal fixes.
> Is there a concern that this may still leave certain pathways broken
> when the CDROM tray is open?
> 
> I don't know of any immediately without digging again.
> 
>>
>> I personally think we should make all blk_*flush() functions use
>> blk_is_inserted() instead of blk_is_available(). As we have discussed on
>> IRC, there are probably not that many cases a guest can flush a medium
>> in an open tray anyway (because the main use case are read-only
>> CD-ROMs), and even if so, that wouldn't change any data, so even if the
>> guest can actually flush something on an open tray, I don't think anyone
>> would complain.
>>
>> Max
>>
> 
> I have difficulty making pragmatic arguments when purity is at stake,
> but I've already wandered outside of my device model, so I will defer to
> your judgment.
> 
>>> diff --git a/block/block-backend.c b/block/block-backend.c
>>> index 34500e6..d1e875e 100644
>>> --- a/block/block-backend.c
>>> +++ b/block/block-backend.c
>>> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk)
>>>  
>>>  int blk_flush(BlockBackend *blk)
>>>  {
>>> -if (!blk_is_available(blk)) {
>>> +if (!blk_is_inserted(blk)) {
>>>  return -ENOMEDIUM;
>>>  }
>>>  
>>>
>>
>>
> 
> Is this a NACK unless I attempt to address the wider design issue?

I just don't see a point in using blk_is_inserted() here but
blk_is_available() in the other blk_*flush() functions. If
blk_is_inserted() is correct for blk_flush(), it should be correct for
blk_co_flush() and blk_aio_flush(), too.

Maybe I should emphasize that I decided between is_available() and
is_inserted() basically on what felt right to me. There's not really
that much research behind it, so changing it is completely fine.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4 00/11] nbd: tighter protocol compliance

2016-06-14 Thread Paolo Bonzini



On 13/06/2016 18:49, Eric Blake wrote:
>>> >> 004/11:[] [--] 'nbd: Improve server handling of bogus commands'
>> > 
>> > Applied with some changes, see reply to individual patch.
> Not sure I agree with all of your comments on that patch regarding
> behavior on read errors, but further changes can be followup patches.
> 
> 
>> > I'll need a v5 of just patch 8 and patch 9; I'm queuing the rest and
>> > will send a pull request later this week.
> Do you have a git tree that I can rebase on top of?

Just wait tomorrow. :)

Paolo

[Qemu-block] [PATCH v3 0/2] Report format specific info for LUKS block driver

2016-06-14 Thread Daniel P. Berrange

This is a followup to:

  v1: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01723.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03642.html

The 'qemu-img info' tool has ability to print format specific
information, eg with qcow2 it reports two extra items:

  $ qemu-img info ~/VirtualMachines/demo.qcow2
  image: /home/berrange/VirtualMachines/demo.qcow2
  file format: qcow2
  virtual size: 3.0G (3221225472 bytes)
  disk size: 140K
  cluster_size: 65536
  Format specific information:
  compat: 0.10
  refcount bits: 16


This is not currently wired up for the LUKS driver. This patch
series adds that support so that we can report useful data about
the LUKS volume such as the crypto algorithm choices, key slot
usage and other volume metadata.

The first patch extends the crypto API to allow querying of the
format specific metadata

The second patches extends the block API to allow the LUKS driver
to report the format specific metadata.

$ qemu-img info ~/VirtualMachines/demo.luks
image: /home/berrange/VirtualMachines/demo.luks
file format: luks
virtual size: 98M (102760448 bytes)
disk size: 100M
encrypted: yes
Format specific information:
ivgen alg: plain64
hash alg: sha1
cipher alg: aes-128
uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
cipher mode: xts
slots:
[0]:
active: true
iters: 572706
key offset: 4096
stripes: 4000
[1]:
active: false
key offset: 135168
[2]:
active: false
key offset: 266240
[3]:
active: false
key offset: 397312
[4]:
active: false
key offset: 528384
[5]:
active: false
key offset: 659456
[6]:
active: false
key offset: 790528
[7]:
active: false
key offset: 921600
payload offset: 2097152
master key iters: 142375

Technically most of the code changes here are in the crypto
layer, rather than the block layer. I'm fine with both patches
going through the block maintainer tree, or can submit a both
patches myself as, for sake of simplicity of merge.

Changed in v3:

 - Do full struct copy instead of field-by-field copy (Max)
 - Simplify handling of linked list pointers (Max)
 - Use g_strndup with uuid to guarantee null termination (Max)
 - Misc typos (Max)

Changed in v2:

 - Drop patches related to creating a text output visitor to
   format the ImageInfoSpecific data. This will be continued
   in a separate patch series
 - Fix key offset to be in bytes instead of sectors
 - Drop the duplicated ImageInfoSpecificLUKS type and just
   directly use QCryptoBlockInfoLUKS type in block layer
 - Skip reporting stripes/iters if keyslot is inactive
 - Add missing QAPI schema docs



Daniel P. Berrange (2):
  crypto: add support for querying parameters for block encryption
  block: export LUKS specific data to qemu-img info

 block/crypto.c | 49 
 crypto/block-luks.c| 67 
 crypto/block.c | 17 +++
 crypto/blockpriv.h |  4 +++
 include/crypto/block.h | 16 +++
 qapi/block-core.json   |  6 +++-
 qapi/crypto.json   | 76 ++
 7 files changed, 234 insertions(+), 1 deletion(-)

-- 
2.5.5

Re: [Qemu-block] [PATCH v3 1/2] crypto: add support for querying parameters for block encryption

2016-06-14 Thread Max Reitz

On 14.06.2016 18:24, Daniel P. Berrange wrote:
> When creating new block encryption volumes, we accept a list of
> parameters to control the formatting process. It is useful to
> be able to query what those parameters were for existing block
> devices. Add a qcrypto_block_get_info() method which returns a
> QCryptoBlockInfo instance to report this data.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  crypto/block-luks.c| 67 
>  crypto/block.c | 17 +++
>  crypto/blockpriv.h |  4 +++
>  include/crypto/block.h | 16 +++
>  qapi/crypto.json   | 76 
> ++
>  5 files changed, 180 insertions(+)

Reviewed-by: Max Reitz 

(I'll be waiting whether Eric wants to comment)



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v3 1/2] crypto: add support for querying parameters for block encryption

2016-06-14 Thread Daniel P. Berrange

When creating new block encryption volumes, we accept a list of
parameters to control the formatting process. It is useful to
be able to query what those parameters were for existing block
devices. Add a qcrypto_block_get_info() method which returns a
QCryptoBlockInfo instance to report this data.

Signed-off-by: Daniel P. Berrange 
---
 crypto/block-luks.c| 67 
 crypto/block.c | 17 +++
 crypto/blockpriv.h |  4 +++
 include/crypto/block.h | 16 +++
 qapi/crypto.json   | 76 ++
 5 files changed, 180 insertions(+)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 63649f1..6e940fb 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -201,6 +201,15 @@ QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSHeader) != 
592);
 
 struct QCryptoBlockLUKS {
 QCryptoBlockLUKSHeader header;
+
+/* Cache parsed versions of what's in header fields,
+ * as we can't rely on QCryptoBlock.cipher being
+ * non-NULL */
+QCryptoCipherAlgorithm cipher_alg;
+QCryptoCipherMode cipher_mode;
+QCryptoIVGenAlgorithm ivgen_alg;
+QCryptoHashAlgorithm ivgen_hash_alg;
+QCryptoHashAlgorithm hash_alg;
 };
 
 
@@ -835,6 +844,12 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 block->payload_offset = luks->header.payload_offset *
 QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
 
+luks->cipher_alg = cipheralg;
+luks->cipher_mode = ciphermode;
+luks->ivgen_alg = ivalg;
+luks->ivgen_hash_alg = ivhash;
+luks->hash_alg = hash;
+
 g_free(masterkey);
 g_free(password);
 
@@ -1250,6 +1265,12 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 goto error;
 }
 
+luks->cipher_alg = luks_opts.cipher_alg;
+luks->cipher_mode = luks_opts.cipher_mode;
+luks->ivgen_alg = luks_opts.ivgen_alg;
+luks->ivgen_hash_alg = luks_opts.ivgen_hash_alg;
+luks->hash_alg = luks_opts.hash_alg;
+
 memset(masterkey, 0, luks->header.key_bytes);
 g_free(masterkey);
 memset(slotkey, 0, luks->header.key_bytes);
@@ -1284,6 +1305,51 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 }
 
 
+static int qcrypto_block_luks_get_info(QCryptoBlock *block,
+   QCryptoBlockInfo *info,
+   Error **errp)
+{
+QCryptoBlockLUKS *luks = block->opaque;
+QCryptoBlockInfoLUKSSlot *slot;
+QCryptoBlockInfoLUKSSlotList *slots = NULL, **prev = >u.luks.slots;
+size_t i;
+
+info->u.luks.cipher_alg = luks->cipher_alg;
+info->u.luks.cipher_mode = luks->cipher_mode;
+info->u.luks.ivgen_alg = luks->ivgen_alg;
+if (info->u.luks.ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) {
+info->u.luks.has_ivgen_hash_alg = true;
+info->u.luks.ivgen_hash_alg = luks->ivgen_hash_alg;
+}
+info->u.luks.hash_alg = luks->hash_alg;
+info->u.luks.payload_offset = block->payload_offset;
+info->u.luks.master_key_iters = luks->header.master_key_iterations;
+info->u.luks.uuid = g_strndup((const char *)luks->header.uuid,
+  sizeof(luks->header.uuid));
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+slots = g_new0(QCryptoBlockInfoLUKSSlotList, 1);
+*prev = slots;
+
+slots->value = slot = g_new0(QCryptoBlockInfoLUKSSlot, 1);
+slot->active = luks->header.key_slots[i].active ==
+QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
+slot->key_offset = luks->header.key_slots[i].key_offset
+ * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
+if (slot->active) {
+slot->has_iters = true;
+slot->iters = luks->header.key_slots[i].iterations;
+slot->has_stripes = true;
+slot->stripes = luks->header.key_slots[i].stripes;
+}
+
+prev = >next;
+}
+
+return 0;
+}
+
+
 static void qcrypto_block_luks_cleanup(QCryptoBlock *block)
 {
 g_free(block->opaque);
@@ -1321,6 +1387,7 @@ qcrypto_block_luks_encrypt(QCryptoBlock *block,
 const QCryptoBlockDriver qcrypto_block_driver_luks = {
 .open = qcrypto_block_luks_open,
 .create = qcrypto_block_luks_create,
+.get_info = qcrypto_block_luks_get_info,
 .cleanup = qcrypto_block_luks_cleanup,
 .decrypt = qcrypto_block_luks_decrypt,
 .encrypt = qcrypto_block_luks_encrypt,
diff --git a/crypto/block.c b/crypto/block.c
index da60eba..be823ee 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -105,6 +105,23 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
 }
 
 
+QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
+ Error **errp)
+{
+QCryptoBlockInfo *info = g_new0(QCryptoBlockInfo, 1);
+
+info->format = block->format;
+
+if (block->driver->get_info &&
+block->driver->get_info(block, info, errp) < 0) {
+g_free(info);
+return NULL;
+}

Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray

2016-06-14 Thread John Snow



On 06/14/2016 09:19 AM, Max Reitz wrote:
> On 10.06.2016 23:59, John Snow wrote:
>> If a device still has an attached BDS because the medium has not yet
>> been removed, we will be unable to migrate to a new host because
>> blk_flush will return an error for that backend.
>>
>> Replace the call to blk_is_available to blk_is_inserted to weaken
>> the check and allow flushes from the backend to work, while still
>> disallowing flushes from the frontend/device model to work.
>>
>> This fixes a regression present in 2.6.0 caused by the following commit:
>> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
>> block: Move some bdrv_*_all() functions to BB
>>
>> Signed-off-by: John Snow 
>> ---
>>  block/block-backend.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I
> guess you exclude them here because you specifically want to fix the
> issue mentioned in the commit message, but then we could just make
> blk_flush_all() ignore an -ENOMEDIUM.

Yeah, I didn't investigate the full path. Just making the minimal fixes.
Is there a concern that this may still leave certain pathways broken
when the CDROM tray is open?

I don't know of any immediately without digging again.

> 
> I personally think we should make all blk_*flush() functions use
> blk_is_inserted() instead of blk_is_available(). As we have discussed on
> IRC, there are probably not that many cases a guest can flush a medium
> in an open tray anyway (because the main use case are read-only
> CD-ROMs), and even if so, that wouldn't change any data, so even if the
> guest can actually flush something on an open tray, I don't think anyone
> would complain.
> 
> Max
> 

I have difficulty making pragmatic arguments when purity is at stake,
but I've already wandered outside of my device model, so I will defer to
your judgment.

>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 34500e6..d1e875e 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk)
>>  
>>  int blk_flush(BlockBackend *blk)
>>  {
>> -if (!blk_is_available(blk)) {
>> +if (!blk_is_inserted(blk)) {
>>  return -ENOMEDIUM;
>>  }
>>  
>>
> 
> 

Is this a NACK unless I attempt to address the wider design issue?

Re: [Qemu-block] [PATCH v3 0/5] block/mirror: Fix (?) target backing BDS

2016-06-14 Thread Max Reitz

On 10.06.2016 20:57, Max Reitz wrote:
> Issue #1: If the target image does not have a backing BDS before mirror
> completion, qemu tries really hard to give it a backing BDS. If the
> source has a backing BDS, it will actually always "succeed".
> In some cases, the target is not supposed to have a backing BDS, though
> (absolute-paths: because of sync=full; existing: because the target
> image does not have a backing file; blockdev-mirror: because of an
> explicit "backing": ""). Then, this is pretty bad behavior.
> 
> This should generally not change the target's visible data, but it still
> is ugly.
> 
> Issue #2: Currently the backing chain of the target is basically opened
> using bdrv_open_backing_file() (except for sometimes™). This results in
> multiple BDSs for a single physical file, which is bad. In most use
> cases, this is only temporary, but it still is bad.
> 
> If we can reuse the existing backing chain of the source (which is with
> drive-mirror in "absolute-paths" mode), we should just do so.

Following encouragement from Kevin on IRC, and apparently his acceptance
of my commit message proposal for patch 3, I have applied the series to
my block branch with said commit message added to patch 3 and the
superfluous imports removed from patch 4 (thanks, Fam!).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Max Reitz

On 14.06.2016 12:56, Daniel P. Berrange wrote:
> The qemu-img info command has the ability to expose format
> specific metadata about volumes. Wire up this facility for
> the LUKS driver to report on cipher configuration and key
> slot usage.
> 
> $ qemu-img info ~/VirtualMachines/demo.luks
> image: /home/berrange/VirtualMachines/demo.luks
> file format: luks
> virtual size: 98M (102760448 bytes)
> disk size: 100M
> encrypted: yes
> Format specific information:
> ivgen alg: plain64
> hash alg: sha1
> cipher alg: aes-128
> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
> cipher mode: xts
> slots:
> [0]:
> active: true
> iters: 572706
> key offset: 4096
> stripes: 4000
> [1]:
> active: false
> key offset: 135168
> [2]:
> active: false
> key offset: 266240
> [3]:
> active: false
> key offset: 397312
> [4]:
> active: false
> key offset: 528384
> [5]:
> active: false
> key offset: 659456
> [6]:
> active: false
> key offset: 790528
> [7]:
> active: false
> key offset: 921600
> payload offset: 2097152
> master key iters: 142375
> 
> One somewhat undesirable artifact is that the data fields are
> printed out in (apparently) random order. This will be addressed
> later by changing the way the block layer pretty-prints the
> image specific data.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c   | 59 
> 
>  qapi/block-core.json |  7 ++-
>  2 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 758e14e..823572f 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char *filename,
> filename, opts, errp);
>  }
>  
> +static int block_crypto_get_info_luks(BlockDriverState *bs,
> +  BlockDriverInfo *bdi)
> +{
> +BlockDriverInfo subbdi;
> +int ret;
> +
> +ret = bdrv_get_info(bs->file->bs, );
> +if (ret != 0) {
> +return ret;
> +}
> +
> +bdi->unallocated_blocks_are_zero = false;
> +bdi->can_write_zeroes_with_unmap = false;
> +bdi->cluster_size = subbdi.cluster_size;
> +
> +return 0;
> +}
> +
> +static ImageInfoSpecific *
> +block_crypto_get_specific_info_luks(BlockDriverState *bs)
> +{
> +BlockCrypto *crypto = bs->opaque;
> +ImageInfoSpecific *spec_info;
> +QCryptoBlockInfo *info;
> +
> +info = qcrypto_block_get_info(crypto->block, NULL);
> +if (!info) {
> +return NULL;
> +}
> +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
> +qapi_free_QCryptoBlockInfo(info);
> +return NULL;
> +}
> +
> +spec_info = g_new(ImageInfoSpecific, 1);
> +spec_info->type =  IMAGE_INFO_SPECIFIC_KIND_LUKS;

One space too many.

> +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
> +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg;
> +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode;
> +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg;
> +spec_info->u.luks.data->has_ivgen_hash_alg =
> +info->u.luks.has_ivgen_hash_alg;
> +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg;
> +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg;
> +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset;
> +spec_info->u.luks.data->master_key_iters = info->u.luks.master_key_iters;
> +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid);
> +
> +/* Steal the pointer instead of bothering to copy */
> +spec_info->u.luks.data->slots = info->u.luks.slots;
> +info->u.luks.slots = NULL;

Why not just steal everything by *spec_info->u.luks.data = info->u.luks
and then making sure the qapi_free() call below won't free anything in
there with a memset(>u.luks, 0, sizeof(info->u.luks))?

> +
> +qapi_free_QCryptoBlockInfo(info);
> +
> +return spec_info;
> +}
> +
>  BlockDriver bdrv_crypto_luks = {
>  .format_name= "luks",
>  .instance_size  = sizeof(BlockCrypto),
> @@ -578,6 +635,8 @@ BlockDriver bdrv_crypto_luks = {
>  .bdrv_co_readv  = block_crypto_co_readv,
>  .bdrv_co_writev = block_crypto_co_writev,
>  .bdrv_getlength = block_crypto_getlength,
> +.bdrv_get_info  = block_crypto_get_info_luks,
> +.bdrv_get_specific_info = block_crypto_get_specific_info_luks,
>  };
>  
>  static void block_crypto_init(void)
> diff --git

Re: [Qemu-block] [Qemu-devel] [PATCH v7 08/15] block: Simplify block_set_io_throttle

2016-06-14 Thread Markus Armbruster

Eric Blake  writes:

> Now that we can support boxed commands, use it to greatly
> reduce the number of parameters (and likelihood of getting
> out of sync) when adjusting throttle parameters.
>
> Signed-off-by: Eric Blake 
>
> ---
> v7: new patch
> ---
>  qapi/block-core.json |  20 --
>  blockdev.c   | 111 
> +++
>  hmp.c|  45 +
>  3 files changed, 66 insertions(+), 110 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 98a20d2..26f7c0e 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1312,6 +1312,21 @@
>  # the device will be removed from its group and the rest of its
>  # members will not be affected. The 'group' parameter is ignored.
>  #
> +# See BlockIOThrottle for parameter descriptions.

This comment will be trivial as soon as we got used to the 'box' feature
:)

> +#
> +# Returns: Nothing on success
> +#  If @device is not a valid block device, DeviceNotFound
> +#
> +# Since: 1.1
> +##
> +{ 'command': 'block_set_io_throttle', 'box': true,
> +  'data': 'BlockIOThrottle' }
> +
> +##
> +# BlockIOThrottle
> +#
> +# The parameters for the block_set_io_throttle command.

This comment is prone to go stale.

> +#
>  # @device: The name of the device
>  #
>  # @bps: total throughput limit in bytes per second
> @@ -1378,12 +1393,9 @@
>  #
>  # @group: #optional throttle group name (Since 2.4)
>  #
> -# Returns: Nothing on success
> -#  If @device is not a valid block device, DeviceNotFound
> -#
>  # Since: 1.1
>  ##
> -{ 'command': 'block_set_io_throttle',
> +{ 'struct': 'BlockIOThrottle',
>'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
>  'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
>  '*bps_max': 'int', '*bps_rd_max': 'int',
> diff --git a/blockdev.c b/blockdev.c
> index cf5afa3..b8db496 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2625,49 +2625,17 @@ fail:
>  }
>
>  /* throttling disk I/O limits */
> -void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t 
> bps_rd,
> -   int64_t bps_wr,
> -   int64_t iops,
> -   int64_t iops_rd,
> -   int64_t iops_wr,
> -   bool has_bps_max,
> -   int64_t bps_max,
> -   bool has_bps_rd_max,
> -   int64_t bps_rd_max,
> -   bool has_bps_wr_max,
> -   int64_t bps_wr_max,
> -   bool has_iops_max,
> -   int64_t iops_max,
> -   bool has_iops_rd_max,
> -   int64_t iops_rd_max,
> -   bool has_iops_wr_max,
> -   int64_t iops_wr_max,
> -   bool has_bps_max_length,
> -   int64_t bps_max_length,
> -   bool has_bps_rd_max_length,
> -   int64_t bps_rd_max_length,
> -   bool has_bps_wr_max_length,
> -   int64_t bps_wr_max_length,
> -   bool has_iops_max_length,
> -   int64_t iops_max_length,
> -   bool has_iops_rd_max_length,
> -   int64_t iops_rd_max_length,
> -   bool has_iops_wr_max_length,
> -   int64_t iops_wr_max_length,
> -   bool has_iops_size,
> -   int64_t iops_size,
> -   bool has_group,
> -   const char *group, Error **errp)
> +void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp)
>  {

This hunk together with the last one are worth a good chunk of the work
you had to do to get here :)

>  ThrottleConfig cfg;
>  BlockDriverState *bs;
>  BlockBackend *blk;
>  AioContext *aio_context;
>
> -blk = blk_by_name(device);
> +blk = blk_by_name(arg->device);
>  if (!blk) {
>  error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> -  "Device '%s' not found", device);
> +  "Device '%s' not found", arg->device);
>  return;
>  }
>
> @@ -2676,59 +2644,59 @@ void qmp_block_set_io_throttle(const char *device, 
> int64_t bps, int64_t bps_rd,
>
>  bs = blk_bs(blk);
>  if (!bs) {
> -error_setg(errp, "Device '%s' has no medium", device);
> +error_setg(errp, "Device '%s' has no medium", arg->device);
>  goto out;
>  }
>
>  throttle_config_init();
> -cfg.buckets[THROTTLE_BPS_TOTAL].avg = bps;
> -

Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Max Reitz

On 14.06.2016 17:47, Daniel P. Berrange wrote:
> On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote:
>> On 14.06.2016 12:56, Daniel P. Berrange wrote:
>>> The qemu-img info command has the ability to expose format
>>> specific metadata about volumes. Wire up this facility for
>>> the LUKS driver to report on cipher configuration and key
>>> slot usage.
>>>
>>> $ qemu-img info ~/VirtualMachines/demo.luks
>>> image: /home/berrange/VirtualMachines/demo.luks
>>> file format: luks
>>> virtual size: 98M (102760448 bytes)
>>> disk size: 100M
>>> encrypted: yes
>>> Format specific information:
>>> ivgen alg: plain64
>>> hash alg: sha1
>>> cipher alg: aes-128
>>> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
>>> cipher mode: xts
>>> slots:
>>> [0]:
>>> active: true
>>> iters: 572706
>>> key offset: 4096
>>> stripes: 4000
>>> [1]:
>>> active: false
>>> key offset: 135168
>>> [2]:
>>> active: false
>>> key offset: 266240
>>> [3]:
>>> active: false
>>> key offset: 397312
>>> [4]:
>>> active: false
>>> key offset: 528384
>>> [5]:
>>> active: false
>>> key offset: 659456
>>> [6]:
>>> active: false
>>> key offset: 790528
>>> [7]:
>>> active: false
>>> key offset: 921600
>>> payload offset: 2097152
>>> master key iters: 142375
>>>
>>> One somewhat undesirable artifact is that the data fields are
>>> printed out in (apparently) random order. This will be addressed
>>> later by changing the way the block layer pretty-prints the
>>> image specific data.
>>>
>>> Signed-off-by: Daniel P. Berrange 
>>> ---
>>>  block/crypto.c   | 59 
>>> 
>>>  qapi/block-core.json |  7 ++-
>>>  2 files changed, 65 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/crypto.c b/block/crypto.c
>>> index 758e14e..823572f 100644
>>> --- a/block/crypto.c
>>> +++ b/block/crypto.c
>>> @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char 
>>> *filename,
>>> filename, opts, errp);
>>>  }
>>>  
>>> +static int block_crypto_get_info_luks(BlockDriverState *bs,
>>> +  BlockDriverInfo *bdi)
>>> +{
>>> +BlockDriverInfo subbdi;
>>> +int ret;
>>> +
>>> +ret = bdrv_get_info(bs->file->bs, );
>>> +if (ret != 0) {
>>> +return ret;
>>> +}
>>> +
>>> +bdi->unallocated_blocks_are_zero = false;
>>> +bdi->can_write_zeroes_with_unmap = false;
>>> +bdi->cluster_size = subbdi.cluster_size;
>>> +
>>> +return 0;
>>> +}
>>> +
>>> +static ImageInfoSpecific *
>>> +block_crypto_get_specific_info_luks(BlockDriverState *bs)
>>> +{
>>> +BlockCrypto *crypto = bs->opaque;
>>> +ImageInfoSpecific *spec_info;
>>> +QCryptoBlockInfo *info;
>>> +
>>> +info = qcrypto_block_get_info(crypto->block, NULL);
>>> +if (!info) {
>>> +return NULL;
>>> +}
>>> +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
>>> +qapi_free_QCryptoBlockInfo(info);
>>> +return NULL;
>>> +}
>>> +
>>> +spec_info = g_new(ImageInfoSpecific, 1);
>>> +spec_info->type =  IMAGE_INFO_SPECIFIC_KIND_LUKS;
>>
>> One space too many.
>>
>>> +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
>>> +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg;
>>> +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode;
>>> +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg;
>>> +spec_info->u.luks.data->has_ivgen_hash_alg =
>>> +info->u.luks.has_ivgen_hash_alg;
>>> +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg;
>>> +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg;
>>> +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset;
>>> +spec_info->u.luks.data->master_key_iters = 
>>> info->u.luks.master_key_iters;
>>> +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid);
>>> +
>>> +/* Steal the pointer instead of bothering to copy */
>>> +spec_info->u.luks.data->slots = info->u.luks.slots;
>>> +info->u.luks.slots = NULL;
>>
>> Why not just steal everything by *spec_info->u.luks.data = info->u.luks
>> and then making sure the qapi_free() call below won't free anything in
>> there with a memset(>u.luks, 0, sizeof(info->u.luks))?
> 
> I wish we could, but info->u.luks is an embedded QCryptoBlockInfoLUKS,
> not merely a pointer to an independently allocated QCryptoBlockInfoLUKS :-(

Yes, but note the asterisk I put there. Leave the g_new() and just make it:

[Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier

2016-06-14 Thread Denis V. Lunev

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/io.c| 12 +++-
 include/block/block_int.h |  1 +
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2d832aa..d2ad09c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -368,12 +368,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
  */
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
+  QEMUIOVector *qiov,
   int64_t offset,
   unsigned int bytes,
   enum BdrvTrackedRequestType type)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
+.qiov   = qiov,
 .offset = offset,
 .bytes  = bytes,
 .type   = type,
@@ -1073,7 +1075,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }
 
-tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_READ);
+tracked_request_begin(, bs, NULL, offset, bytes, BDRV_TRACKED_READ);
 ret = bdrv_aligned_preadv(bs, , offset, bytes, align,
   use_local_qiov ? _qiov : qiov,
   flags);
@@ -1391,7 +1393,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
  * Pad qiov with the read parts and be sure to have a tracked request not
  * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
  */
-tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
+tracked_request_begin(, bs, qiov, offset, bytes, BDRV_TRACKED_WRITE);
 
 if (!qiov) {
 ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, );
@@ -2098,7 +2100,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 return 0;
 }
 
-tracked_request_begin(, bs, 0, 0, BDRV_TRACKED_FLUSH);
+tracked_request_begin(, bs, NULL, 0, 0, BDRV_TRACKED_FLUSH);
 
 /* Write back all layers by calling one driver function */
 if (bs->drv->bdrv_co_flush) {
@@ -2228,7 +2230,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
-tracked_request_begin(, bs, sector_num, nb_sectors,
+tracked_request_begin(, bs, NULL, sector_num, nb_sectors,
   BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);
 
@@ -2331,7 +2333,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int 
req, void *buf)
 };
 BlockAIOCB *acb;
 
-tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
+tracked_request_begin(_req, bs, NULL, 0, 0, BDRV_TRACKED_IOCTL);
 if (!drv || !drv->bdrv_aio_ioctl) {
 co.ret = -ENOTSUP;
 goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 30a9717..72f463a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -69,6 +69,7 @@ enum BdrvTrackedRequestType {
 
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
+QEMUIOVector *qiov;
 int64_t offset;
 unsigned int bytes;
 enum BdrvTrackedRequestType type;
-- 
2.5.0

[Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror

2016-06-14 Thread Denis V. Lunev

Block commit of the active image to the backing store on a slow disk
could never end. For example with the guest with the following loop
inside
while true; do
dd bs=1k count=1 if=/dev/zero of=x
done
running above slow storage could not complete the operation with a
resonable amount of time:
virsh blockcommit rhel7 sda --active --shallow
virsh qemu-monitor-event
virsh qemu-monitor-command rhel7 \
'{"execute":"block-job-complete",\
  "arguments":{"device":"drive-scsi0-0-0-0"} }'
virsh qemu-monitor-event
Completion event is never received.

This problem could not be fixed easily with the current architecture. We
should either prohibit guest writes (making dirty bitmap dirty) or switch
to the sycnchronous scheme.

This patch implements the latter. It adds mirror_before_write_notify
callback. In this case all data written from the guest is synchnonously
written to the mirror target. Though the problem is solved partially.
We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
done in the next patch.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 7471211..086256c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -58,6 +58,9 @@ typedef struct MirrorBlockJob {
 QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
 int buf_free_count;
 
+NotifierWithReturn before_write;
+CoQueue dependent_writes;
+
 unsigned long *in_flight_bitmap;
 int in_flight;
 int sectors_in_flight;
@@ -125,6 +128,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 g_free(op->buf);
 g_free(op);
 
+qemu_co_queue_restart_all(>dependent_writes);
 if (s->waiting_for_io) {
 qemu_coroutine_enter(s->common.co, NULL);
 }
@@ -511,6 +515,74 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_unref(src);
 }
 
+static int coroutine_fn mirror_before_write_notify(
+NotifierWithReturn *notifier, void *opaque)
+{
+MirrorBlockJob *s = container_of(notifier, MirrorBlockJob, before_write);
+BdrvTrackedRequest *req = opaque;
+MirrorOp *op;
+int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
+int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
+int64_t end_sector = sector_num + nb_sectors;
+int64_t aligned_start, aligned_end;
+
+if (req->type != BDRV_TRACKED_DISCARD && req->type != BDRV_TRACKED_WRITE) {
+/* this is not discard and write, we do not care */
+return 0;
+}
+
+while (1) {
+bool waited = false;
+int64_t sn;
+
+for (sn = sector_num; sn < end_sector; sn += sectors_per_chunk) {
+int64_t chunk = sn / sectors_per_chunk;
+if (test_bit(chunk, s->in_flight_bitmap)) {
+trace_mirror_yield_in_flight(s, chunk, s->in_flight);
+qemu_co_queue_wait(>dependent_writes);
+waited = true;
+}
+}
+
+if (!waited) {
+break;
+}
+}
+
+aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk);
+aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, sectors_per_chunk);
+if (aligned_end > aligned_start) {
+bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start,
+aligned_end - aligned_start);
+}
+
+if (req->type == BDRV_TRACKED_DISCARD) {
+mirror_do_zero_or_discard(s, sector_num, nb_sectors, true);
+return 0;
+}
+
+s->in_flight++;
+s->sectors_in_flight += nb_sectors;
+
+/* Allocate a MirrorOp that is used as an AIO callback.  */
+op = g_new(MirrorOp, 1);
+op->s = s;
+op->sector_num = sector_num;
+op->nb_sectors = nb_sectors;
+op->buf = qemu_try_blockalign(blk_bs(s->target), req->qiov->size);
+if (op->buf == NULL) {
+g_free(op);
+return -ENOMEM;
+}
+qemu_iovec_init(>qiov, req->qiov->niov);
+qemu_iovec_clone(>qiov, req->qiov, op->buf);
+
+blk_aio_pwritev(s->target, req->offset, >qiov, 0,
+mirror_write_complete, op);
+return 0;
+}
+
 static int mirror_dirty_init(MirrorBlockJob *s)
 {
 int64_t sector_num, end;
@@ -764,6 +836,8 @@ immediate_exit:
 mirror_drain(s);
 }
 
+notifier_with_return_remove(>before_write);
+
 assert(s->in_flight == 0);
 qemu_vfree(s->buf);
 g_free(s->cow_bitmap);
@@ -905,6 +979,10 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 return;
 }
 
+

Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Daniel P. Berrange

On Tue, Jun 14, 2016 at 05:49:24PM +0200, Max Reitz wrote:
> On 14.06.2016 17:47, Daniel P. Berrange wrote:
> > On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote:
> >> On 14.06.2016 12:56, Daniel P. Berrange wrote:
> >>> The qemu-img info command has the ability to expose format
> >>> specific metadata about volumes. Wire up this facility for
> >>> the LUKS driver to report on cipher configuration and key
> >>> slot usage.
> >>>
> >>> $ qemu-img info ~/VirtualMachines/demo.luks
> >>> image: /home/berrange/VirtualMachines/demo.luks
> >>> file format: luks
> >>> virtual size: 98M (102760448 bytes)
> >>> disk size: 100M
> >>> encrypted: yes
> >>> Format specific information:
> >>> ivgen alg: plain64
> >>> hash alg: sha1
> >>> cipher alg: aes-128
> >>> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
> >>> cipher mode: xts
> >>> slots:
> >>> [0]:
> >>> active: true
> >>> iters: 572706
> >>> key offset: 4096
> >>> stripes: 4000
> >>> [1]:
> >>> active: false
> >>> key offset: 135168
> >>> [2]:
> >>> active: false
> >>> key offset: 266240
> >>> [3]:
> >>> active: false
> >>> key offset: 397312
> >>> [4]:
> >>> active: false
> >>> key offset: 528384
> >>> [5]:
> >>> active: false
> >>> key offset: 659456
> >>> [6]:
> >>> active: false
> >>> key offset: 790528
> >>> [7]:
> >>> active: false
> >>> key offset: 921600
> >>> payload offset: 2097152
> >>> master key iters: 142375
> >>>
> >>> One somewhat undesirable artifact is that the data fields are
> >>> printed out in (apparently) random order. This will be addressed
> >>> later by changing the way the block layer pretty-prints the
> >>> image specific data.
> >>>
> >>> Signed-off-by: Daniel P. Berrange 
> >>> ---
> >>>  block/crypto.c   | 59 
> >>> 
> >>>  qapi/block-core.json |  7 ++-
> >>>  2 files changed, 65 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/block/crypto.c b/block/crypto.c
> >>> index 758e14e..823572f 100644
> >>> --- a/block/crypto.c
> >>> +++ b/block/crypto.c
> >>> @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char 
> >>> *filename,
> >>> filename, opts, errp);
> >>>  }
> >>>  
> >>> +static int block_crypto_get_info_luks(BlockDriverState *bs,
> >>> +  BlockDriverInfo *bdi)
> >>> +{
> >>> +BlockDriverInfo subbdi;
> >>> +int ret;
> >>> +
> >>> +ret = bdrv_get_info(bs->file->bs, );
> >>> +if (ret != 0) {
> >>> +return ret;
> >>> +}
> >>> +
> >>> +bdi->unallocated_blocks_are_zero = false;
> >>> +bdi->can_write_zeroes_with_unmap = false;
> >>> +bdi->cluster_size = subbdi.cluster_size;
> >>> +
> >>> +return 0;
> >>> +}
> >>> +
> >>> +static ImageInfoSpecific *
> >>> +block_crypto_get_specific_info_luks(BlockDriverState *bs)
> >>> +{
> >>> +BlockCrypto *crypto = bs->opaque;
> >>> +ImageInfoSpecific *spec_info;
> >>> +QCryptoBlockInfo *info;
> >>> +
> >>> +info = qcrypto_block_get_info(crypto->block, NULL);
> >>> +if (!info) {
> >>> +return NULL;
> >>> +}
> >>> +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
> >>> +qapi_free_QCryptoBlockInfo(info);
> >>> +return NULL;
> >>> +}
> >>> +
> >>> +spec_info = g_new(ImageInfoSpecific, 1);
> >>> +spec_info->type =  IMAGE_INFO_SPECIFIC_KIND_LUKS;
> >>
> >> One space too many.
> >>
> >>> +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
> >>> +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg;
> >>> +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode;
> >>> +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg;
> >>> +spec_info->u.luks.data->has_ivgen_hash_alg =
> >>> +info->u.luks.has_ivgen_hash_alg;
> >>> +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg;
> >>> +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg;
> >>> +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset;
> >>> +spec_info->u.luks.data->master_key_iters = 
> >>> info->u.luks.master_key_iters;
> >>> +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid);
> >>> +
> >>> +/* Steal the pointer instead of bothering to copy */
> >>> +spec_info->u.luks.data->slots = info->u.luks.slots;
> >>> +info->u.luks.slots = NULL;
> >>
> >> Why not just steal everything by *spec_info->u.luks.data = info->u.luks
> >> and then making sure the qapi_free() call below

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Paolo Bonzini



On 14/06/2016 17:02, Alex Bligh wrote:
> 
> On 14 Jun 2016, at 14:32, Paolo Bonzini  wrote:
> 
>>
>> On 13/06/2016 23:41, Alex Bligh wrote:
>>> That's one of the reasons that there is a proposal to add
>>> STRUCTURED_READ to the spec (although I still haven't had time to
>>> implement that for qemu), so that we have a newer approach that allows
>>> for proper error handling without ambiguity on whether bogus bytes must
>>> be sent on a failed read.  But you'd have to convince me that ALL
>>> existing NBD server and client implementations expect to handle a read
>>> error without read payload, otherwise, I will stick with the notion that
>>> the current spec wording is correct, and that read errors CANNOT be
>>> gracefully recovered from unless BOTH sides transfer (possibly bogus)
>>> bytes along with the error message, and which is why BOTH sides of the
>>> protocol are warned that read errors usually result in a disconnection
>>> rather than clean continuation, without the addition of STRUCTURED_READ.
>>
>> I suspect that there are exactly two client implementations,
> 
> My understanding is that there are more than 2 client implementations.
> A quick google found at least one BSD client. I bet read error handling
> is a mess in all of them.

Found it, it is exactly the same as Linux and QEMU:

https://github.com/bitrig/bitrig/blob/418985278/sys/dev/nbd.c#L577

>> namely
>> Linux and QEMU's, and both do the right thing.
> 
> This depends what you mean by 'right'. Both appear to be non-compliant
> with the standard.

I mean "what makes sense".

> Note the standard is not defined by the client implementation, but
> by the protocol document.
> 
> IMHO the 'right thing' is what is in the spec. Servers can't send an
> error in any other way if they don't buffer the entire read first, as the
> read may error towards the end.
> 
> To illustrate the problem, look consider what qemu itself would do as
> a server if it can't buffer the entire read issued to it.

Return ENOMEM?

> The spec originally was not clear on how errors on reads should be
> handled, leading to any read causing a protocol drop. The spec is
> now clear. Unfortunately it is not possible to make a back compatible
> fix. Hence the real fix here is to implement structured replies,
> which is what Eric and I have been working on.

I agree that structured replies are better.  However, it looks like the
de facto status prior to structured replies is that the error is in the
spec, and this patch introduces a regression.

Paolo

Re: [Qemu-block] [PATCH] m25p80: fix test on blk_pread() return value

2016-06-14 Thread Cédric Le Goater

On 06/14/2016 10:38 AM, Kevin Wolf wrote:
> Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben:
 #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
 req=, offset=30878208, 
 bytes=512, qiov=0x7fa7f47fee60, flags=0)
 at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
 #5  0x7fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, 
 bytes=512, qiov=0x7fa80d5191c0, 
 flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | 
 BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 
 4278124256), flags@entry=(unknown: 0))
 at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492
>>>
>>> That 'flags' value looks bogus...
>>>
 #6  0x7fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, 
 offset=30878208, bytes=256, qiov=0x7fa80d5191c0, 
 flags=(unknown: 0)) at 
 /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788
 #7  0x7fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0)
 at 
 /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977
 #8  0x7fa81c6c823a in coroutine_trampoline (i0=, 
 i1=)
 at 
 /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78
 #9  0x7fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>> and we don't get anything further in the backtrace beyond coroutines, to
>>> see who's sending the bad parameters.  I recently debugged a bogus flags
>>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are
>>> used in blk_aio_prwv():
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html
>>>
>>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no
>>> longer kept the assert at that point.  But maybe the correct fix, and/or
>>> the hack for catching the bug prior to coroutines, will help you debug
>>> where the bad arguments are coming from.
>>
>> That does not fix the assert.
>>  
 #10 0x7fa80d5189d0 in ?? ()
 #11 0x in ?? ()
 (gdb) up 4
 #4  0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, 
 req=, offset=30878208, 
 bytes=512, qiov=0x7fa7f47fee60, flags=0)
 at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243
 1243   assert(!qiov || bytes == qiov->size);
 (gdb) p *qiov 
 $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256}
>>
>> So, it seems that the issue is coming from the fact that bdrv_co_pwritev()
>> does not handle alignments less than BDRV_SECTOR_SIZE :
>>
>>  /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
>>  uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
>>
>> It calls bdrv_aligned_pwritev() which does the assert : 
>>
>>  assert(!qiov || bytes == qiov->size);
> 
> Yes, but between these two places, there is code that should actually
> enforce the right alignment:
> 
> if ((offset + bytes) & (align - 1)) {
> ...
> }
> 
> You can see in your backtrace that bdrv_aligned_pwritev() gets a
> different qiov than bdrv_co_pwritev() (which is local_qiov in the latter
> function).
> 
> It's just unclear to me why this code extended bytes, but didn't add the
> tail_buf iovec to local_qiov.

The gdb backtrace is bogus. It does not make sense. May be a gdb issue
with multithread on jessie.

In the path tracking the tail bytes, we have : 

 if ((offset + bytes) & (align - 1)) {
...
tail_bytes = (offset + bytes) & (align - 1);
qemu_iovec_add(_qiov, tail_buf + tail_bytes, align - tail_bytes);

bytes = ROUND_UP(bytes, align);
}

This is where the issue is I think. The qiov holds 256 and bytes 512.

I have no idea how to fix that though.

Thanks,

C.

[Qemu-block] [PATCH 4/9] mirror: efficiently zero out target

2016-06-14 Thread Denis V. Lunev

With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed
into the wire. Thus the target could be very efficiently zeroed out. This
is should be done with the largest chunk possible.

This improves the performance of the live migration of the empty disk by
150 times if NBD supports write_zeroes.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index c7b3639..c2f8773 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -21,6 +21,7 @@
 #include "qemu/ratelimit.h"
 #include "qemu/bitmap.h"
 
+#define MIRROR_ZERO_CHUNK   (3u << (29 - BDRV_SECTOR_BITS))  /* 1.5 Gb */
 #define SLICE_TIME1ULL /* ns */
 #define MAX_IN_FLIGHT 16
 #define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
@@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 
 end = s->bdev_length / BDRV_SECTOR_SIZE;
 
-if (base == NULL && !bdrv_has_zero_init(target_bs)) {
+if (base == NULL && !bdrv_has_zero_init(target_bs) &&
+target_bs->drv->bdrv_co_write_zeroes == NULL) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
 return 0;
 }
@@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 }
 sector_num += n;
 }
+
+if (base != NULL || bdrv_has_zero_init(target_bs)) {
+/* no need to zero out entire disk */
+return 0;
+}
+
+for (sector_num = 0; sector_num < end; ) {
+int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num);
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+if (now - last_pause_ns > SLICE_TIME) {
+last_pause_ns = now;
+block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
+}
+
+if (block_job_is_cancelled(>common)) {
+return -EINTR;
+}
+
+if (s->in_flight == MAX_IN_FLIGHT) {
+trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
+mirror_wait_for_io(s);
+continue;
+}
+
+mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+sector_num += nb_sectors;
+}
 return 0;
 }
 
-- 
2.5.0

[Qemu-block] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv

2016-06-14 Thread Denis V. Lunev

4th argument is flags rather than size. Fortunately flags occupies
5 less significant bits and they are always zero due to alignment.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..3760e29 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -156,8 +156,7 @@ static void mirror_read_complete(void *opaque, int ret)
 mirror_iteration_done(op, ret);
 return;
 }
-blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov,
-op->nb_sectors * BDRV_SECTOR_SIZE,
+blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov, 0,
 mirror_write_complete, op);
 }
 
@@ -274,8 +273,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov,
-   nb_sectors * BDRV_SECTOR_SIZE,
+blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov, 0,
mirror_read_complete, op);
 return ret;
 }
-- 
2.5.0

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Alex Bligh

> On 14 Jun 2016, at 16:11, Paolo Bonzini  wrote:
> 
>> To illustrate the problem, look consider what qemu itself would do as
>> a server if it can't buffer the entire read issued to it.
> 
> Return ENOMEM?

Well OK, qemu then 'works' on the basis it breaks another
part of the spec, which is coping with long reads.

> However, it looks like the
> de facto status prior to structured replies is that the error is in the
> spec, and this patch introduces a regression.

Well, I guess the patch makes it work the same as the
reference server implementation and the spec, which I'd
consider a fix. My view is that the error is in the
kernel client. I think Erik CC'd in nbd-general
re the comment that the spec was broken; I don't think
it is, and don't propose to change it. Wouter might or
might not feel differently.

It's been reasonably well known (I wrote about it
at least 3 years ago), that the current implementation
(reference + kernel) does not cope well with errors
on reads, so I'm guessing one is just trading one
set of brokenness for another. So I'm pretty relaxed
about what goes in qemu.

-- 
Alex Bligh

Re: [Qemu-block] [PATCH v2 1/2] crypto: add support for querying parameters for block encryption

2016-06-14 Thread Max Reitz

On 14.06.2016 12:56, Daniel P. Berrange wrote:
> When creating new block encryption volumes, we accept a list of
> parameters to control the formatting process. It is useful to
> be able to query what those parameters were for existing block
> devices. Add a qcrypto_block_get_info() method which returns a
> QCryptoBlockInfo instance to report this data.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  crypto/block-luks.c| 70 ++
>  crypto/block.c | 17 +++
>  crypto/blockpriv.h |  4 +++
>  include/crypto/block.h | 16 +++
>  qapi/crypto.json   | 76 
> ++
>  5 files changed, 183 insertions(+)
> 
> diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> index 63649f1..6a6ef07 100644
> --- a/crypto/block-luks.c
> +++ b/crypto/block-luks.c

[...]

> @@ -1284,6 +1305,54 @@ qcrypto_block_luks_create(QCryptoBlock *block,
>  }
>  
>  
> +static int qcrypto_block_luks_get_info(QCryptoBlock *block,
> +   QCryptoBlockInfo *info,
> +   Error **errp)
> +{
> +QCryptoBlockLUKS *luks = block->opaque;
> +QCryptoBlockInfoLUKSSlot *slot;
> +QCryptoBlockInfoLUKSSlotList *slots = NULL, *prev = NULL;

You could make prev a double pointer, initializing it to
>u.luks.slots... [1]

> +size_t i;
> +
> +info->u.luks.cipher_alg = luks->cipher_alg;
> +info->u.luks.cipher_mode = luks->cipher_mode;
> +info->u.luks.ivgen_alg = luks->ivgen_alg;
> +if (info->u.luks.ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) {
> +info->u.luks.has_ivgen_hash_alg = true;
> +info->u.luks.ivgen_hash_alg = luks->ivgen_hash_alg;
> +}
> +info->u.luks.hash_alg = luks->hash_alg;
> +info->u.luks.payload_offset = block->payload_offset;
> +info->u.luks.master_key_iters = luks->header.master_key_iterations;
> +info->u.luks.uuid = g_strdup((const char *)luks->header.uuid);

Wouldn't g_strndup() with sizeof(luks->header.uuid) be a better choice?

> +
> +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
> +slots = g_new0(QCryptoBlockInfoLUKSSlotList, 1);
> +if (i == 0) {
> +info->u.luks.slots = slots;
> +} else {
> +prev->next = slots;
> +}

[1] ...then unconditionally use *prev = slots here...

> +
> +slots->value = slot = g_new0(QCryptoBlockInfoLUKSSlot, 1);
> +slot->active = luks->header.key_slots[i].active ==
> +QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
> +slot->key_offset = luks->header.key_slots[i].key_offset
> + * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
> +if (slot->active) {
> +slot->has_iters = true;
> +slot->iters = luks->header.key_slots[i].iterations;
> +slot->has_stripes = true;
> +slot->stripes = luks->header.key_slots[i].stripes;
> +}
> +
> +prev = slots;

[1] ...and prev = >next here.

> +}
> +
> +return 0;
> +}
> +
> +
>  static void qcrypto_block_luks_cleanup(QCryptoBlock *block)
>  {
>  g_free(block->opaque);

[...]

> diff --git a/include/crypto/block.h b/include/crypto/block.h
> index a21e11f..6256f64 100644
> --- a/include/crypto/block.h
> +++ b/include/crypto/block.h
> @@ -138,6 +138,22 @@ QCryptoBlock 
> *qcrypto_block_create(QCryptoBlockCreateOptions *options,
> void *opaque,
> Error **errp);
>  
> +
> +/**
> + * qcrypto_block_get_info:
> + * block: the block encryption object

I think this is missing an @.

Max

> + * @errp: pointer to a NULL-initialized error object
> + *
> + * Get information about the configuration options for the
> + * block encryption object. This includes details such as
> + * the cipher algorithms, modes, and initialization vector
> + * generators.
> + *
> + * Returns: a block encryption info object, or NULL on error
> + */
> +QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
> + Error **errp);
> +
>  /**
>   * @qcrypto_block_decrypt:
>   * @block: the block encryption object




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info

2016-06-14 Thread Daniel P. Berrange

On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote:
> On 14.06.2016 12:56, Daniel P. Berrange wrote:
> > The qemu-img info command has the ability to expose format
> > specific metadata about volumes. Wire up this facility for
> > the LUKS driver to report on cipher configuration and key
> > slot usage.
> > 
> > $ qemu-img info ~/VirtualMachines/demo.luks
> > image: /home/berrange/VirtualMachines/demo.luks
> > file format: luks
> > virtual size: 98M (102760448 bytes)
> > disk size: 100M
> > encrypted: yes
> > Format specific information:
> > ivgen alg: plain64
> > hash alg: sha1
> > cipher alg: aes-128
> > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
> > cipher mode: xts
> > slots:
> > [0]:
> > active: true
> > iters: 572706
> > key offset: 4096
> > stripes: 4000
> > [1]:
> > active: false
> > key offset: 135168
> > [2]:
> > active: false
> > key offset: 266240
> > [3]:
> > active: false
> > key offset: 397312
> > [4]:
> > active: false
> > key offset: 528384
> > [5]:
> > active: false
> > key offset: 659456
> > [6]:
> > active: false
> > key offset: 790528
> > [7]:
> > active: false
> > key offset: 921600
> > payload offset: 2097152
> > master key iters: 142375
> > 
> > One somewhat undesirable artifact is that the data fields are
> > printed out in (apparently) random order. This will be addressed
> > later by changing the way the block layer pretty-prints the
> > image specific data.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  block/crypto.c   | 59 
> > 
> >  qapi/block-core.json |  7 ++-
> >  2 files changed, 65 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 758e14e..823572f 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> > @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char 
> > *filename,
> > filename, opts, errp);
> >  }
> >  
> > +static int block_crypto_get_info_luks(BlockDriverState *bs,
> > +  BlockDriverInfo *bdi)
> > +{
> > +BlockDriverInfo subbdi;
> > +int ret;
> > +
> > +ret = bdrv_get_info(bs->file->bs, );
> > +if (ret != 0) {
> > +return ret;
> > +}
> > +
> > +bdi->unallocated_blocks_are_zero = false;
> > +bdi->can_write_zeroes_with_unmap = false;
> > +bdi->cluster_size = subbdi.cluster_size;
> > +
> > +return 0;
> > +}
> > +
> > +static ImageInfoSpecific *
> > +block_crypto_get_specific_info_luks(BlockDriverState *bs)
> > +{
> > +BlockCrypto *crypto = bs->opaque;
> > +ImageInfoSpecific *spec_info;
> > +QCryptoBlockInfo *info;
> > +
> > +info = qcrypto_block_get_info(crypto->block, NULL);
> > +if (!info) {
> > +return NULL;
> > +}
> > +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
> > +qapi_free_QCryptoBlockInfo(info);
> > +return NULL;
> > +}
> > +
> > +spec_info = g_new(ImageInfoSpecific, 1);
> > +spec_info->type =  IMAGE_INFO_SPECIFIC_KIND_LUKS;
> 
> One space too many.
> 
> > +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
> > +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg;
> > +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode;
> > +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg;
> > +spec_info->u.luks.data->has_ivgen_hash_alg =
> > +info->u.luks.has_ivgen_hash_alg;
> > +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg;
> > +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg;
> > +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset;
> > +spec_info->u.luks.data->master_key_iters = 
> > info->u.luks.master_key_iters;
> > +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid);
> > +
> > +/* Steal the pointer instead of bothering to copy */
> > +spec_info->u.luks.data->slots = info->u.luks.slots;
> > +info->u.luks.slots = NULL;
> 
> Why not just steal everything by *spec_info->u.luks.data = info->u.luks
> and then making sure the qapi_free() call below won't free anything in
> there with a memset(>u.luks, 0, sizeof(info->u.luks))?

I wish we could, but info->u.luks is an embedded QCryptoBlockInfoLUKS,
not merely a pointer to an independently allocated QCryptoBlockInfoLUKS :-(


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o-

[Qemu-block] [PATCH 0/9] major rework of drive-mirror

2016-06-14 Thread Denis V. Lunev

Block commit of the active image to the backing store on a slow disk
could never end. For example with the guest with the following loop
inside
while true; do
dd bs=1k count=1 if=/dev/zero of=x
done
running above slow storage could not complete the operation with a
resonable amount of time:
virsh blockcommit rhel7 sda --active --shallow
virsh qemu-monitor-event
virsh qemu-monitor-command rhel7 \
'{"execute":"block-job-complete",\
  "arguments":{"device":"drive-scsi0-0-0-0"} }'
virsh qemu-monitor-event
Completion event is never received.

This problem could not be fixed easily with the current architecture. We
should either prohibit guest writes (making dirty bitmap dirty) or switch
to the sycnchronous scheme.

This series switches driver mirror to synch scheme. Actually we can have
something more intelligent and switch to sync mirroring just after
the first pass over the bitmap. Though this could be done relatively
easily during discussion. The most difficult things are here.

The set also adds some performance improvements dealing with
known-to-be-zero areas.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 

Denis V. Lunev (9):
  mirror: fix calling of blk_aio_pwritev/blk_aio_preadv
  mirror: create mirror_dirty_init helper for mirror_run
  mirror: optimize dirty bitmap filling in mirror_run a bit
  mirror: efficiently zero out target
  mirror: improve performance of mirroring of empty disk
  block: pass qiov into before_write notifier
  mirror: allow to save buffer for QEMUIOVector in MirrorOp
  mirror: use synch scheme for drive mirror
  mirror: replace bdrv_dirty_bitmap with plain hbitmap

 block/io.c|  12 +-
 block/mirror.c| 290 +++---
 include/block/block_int.h |   1 +
 3 files changed, 229 insertions(+), 74 deletions(-)

-- 
2.5.0

[Qemu-block] [PATCH 9/9] mirror: replace bdrv_dirty_bitmap with plain hbitmap

2016-06-14 Thread Denis V. Lunev

We have replaced the mechanics of syncing new writes in the previous patch
and thus do not need to track dirty changes anymore.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 58 ++
 1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 086256c..926fd13 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -52,7 +52,7 @@ typedef struct MirrorBlockJob {
 size_t buf_size;
 int64_t bdev_length;
 unsigned long *cow_bitmap;
-BdrvDirtyBitmap *dirty_bitmap;
+HBitmap *copy_bitmap;
 HBitmapIter hbi;
 uint8_t *buf;
 QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
@@ -141,7 +141,7 @@ static void mirror_write_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;
 
-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+hbitmap_set(s->copy_bitmap, op->sector_num, op->nb_sectors);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -157,7 +157,7 @@ static void mirror_read_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;
 
-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+hbitmap_set(s->copy_bitmap, op->sector_num, op->nb_sectors);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -328,9 +328,9 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 
 sector_num = hbitmap_iter_next(>hbi);
 if (sector_num < 0) {
-bdrv_dirty_iter_init(s->dirty_bitmap, >hbi);
+hbitmap_iter_init(>hbi, s->copy_bitmap, 0);
 sector_num = hbitmap_iter_next(>hbi);
-trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
+trace_mirror_restart_iter(s, hbitmap_count(s->copy_bitmap));
 assert(sector_num >= 0);
 }
 
@@ -347,7 +347,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
 int64_t next_chunk = next_sector / sectors_per_chunk;
 if (next_sector >= end ||
-!bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
+!hbitmap_get(s->copy_bitmap, next_sector)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
@@ -357,7 +357,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 hbitmap_next = hbitmap_iter_next(>hbi);
 if (hbitmap_next > next_sector || hbitmap_next < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
-bdrv_set_dirty_iter(>hbi, next_sector);
+hbitmap_iter_init(>hbi, s->copy_bitmap, next_sector);
 hbitmap_next = hbitmap_iter_next(>hbi);
 }
 assert(hbitmap_next == next_sector);
@@ -368,8 +368,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  * calling bdrv_get_block_status_above could yield - if some blocks are
  * marked dirty in this window, we need to know.
  */
-bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num,
-nb_chunks * sectors_per_chunk);
+hbitmap_reset(s->copy_bitmap, sector_num, nb_chunks * sectors_per_chunk);
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
 int ret;
@@ -553,8 +552,8 @@ static int coroutine_fn mirror_before_write_notify(
 aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk);
 aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, sectors_per_chunk);
 if (aligned_end > aligned_start) {
-bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start,
-aligned_end - aligned_start);
+hbitmap_reset(s->copy_bitmap, aligned_start,
+  aligned_end - aligned_start);
 }
 
 if (req->type == BDRV_TRACKED_DISCARD) {
@@ -596,7 +595,7 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 
 if (base == NULL && !bdrv_has_zero_init(target_bs) &&
 target_bs->drv->bdrv_co_write_zeroes == NULL) {
-bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
+hbitmap_set(s->copy_bitmap, 0, end);
 return 0;
 }
 
@@ -625,7 +624,7 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 
 assert(n > 0);
 if (ret == 1) {
-bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
+hbitmap_set(s->copy_bitmap,

[Qemu-block] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-14 Thread Denis V. Lunev

We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c2f8773..d8be80a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -363,7 +363,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
 int ret;
-int io_sectors;
+int io_sectors, io_sectors_acct;
 BlockDriverState *file;
 enum MirrorMethod {
 MIRROR_METHOD_COPY,
@@ -399,12 +399,15 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
 io_sectors = mirror_do_read(s, sector_num, io_sectors);
+io_sectors_acct = io_sectors;
 break;
 case MIRROR_METHOD_ZERO:
 mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
+io_sectors_acct = 0;
 break;
 case MIRROR_METHOD_DISCARD:
 mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
+io_sectors_acct = 0;
 break;
 default:
 abort();
@@ -412,7 +415,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 assert(io_sectors);
 sector_num += io_sectors;
 nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
-delay_ns += ratelimit_calculate_delay(>limit, io_sectors);
+delay_ns += ratelimit_calculate_delay(>limit, io_sectors_acct);
 }
 return delay_ns;
 }
-- 
2.5.0

Re: [Qemu-block] [Qemu-devel] [PATCH v7 09/15] block: Simplify drive-mirror

2016-06-14 Thread Markus Armbruster

Eric Blake  writes:

> Now that we can support boxed commands, use it to greatly
> reduce the number of parameters (and likelihood of getting
> out of sync) when adjusting drive-mirror parameters.
>
> Signed-off-by: Eric Blake 
>
> ---
> v7: new patch
> ---
>  qapi/block-core.json | 17 -
>  blockdev.c   | 72 
> ++--
>  hmp.c| 27 +---
>  3 files changed, 59 insertions(+), 57 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 26f7c0e..885a75a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1108,6 +1108,21 @@
>  #
>  # Start mirroring a block device's writes to a new destination.
>  #
> +# See DriveMirror for parameter descriptions
> +#
> +# Returns: nothing on success
> +#  If @device is not a valid block device, DeviceNotFound
> +#
> +# Since 1.3
> +##
> +{ 'command': 'drive-mirror', 'box': true,
> +  'data': 'DriveMirror' }
> +
> +##
> +# DriveMirror
> +#
> +# The parameters for the drive-mirror command.
> +#
>  # @device:  the name of the device whose writes should be mirrored.
>  #
>  # @target: the target of the new image. If the file exists, or if it
> @@ -1159,7 +1174,7 @@
   #
   # Returns: nothing on success
   #  If @device is not a valid block device, DeviceNotFound

You forgot to delete this part.

   #
>  #
>  # Since 1.3

Kind of (same in previous patch).

>  ##
> -{ 'command': 'drive-mirror',
> +{ 'struct': 'DriveMirror',
>'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>  '*node-name': 'str', '*replaces': 'str',
>  'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> diff --git a/blockdev.c b/blockdev.c
> index b8db496..94850fd 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3457,19 +3457,7 @@ static void blockdev_mirror_common(BlockDriverState 
> *bs,
>   block_job_cb, bs, errp);
>  }
>
> -void qmp_drive_mirror(const char *device, const char *target,
> -  bool has_format, const char *format,
> -  bool has_node_name, const char *node_name,
> -  bool has_replaces, const char *replaces,
> -  enum MirrorSyncMode sync,
> -  bool has_mode, enum NewImageMode mode,
> -  bool has_speed, int64_t speed,
> -  bool has_granularity, uint32_t granularity,
> -  bool has_buf_size, int64_t buf_size,
> -  bool has_on_source_error, BlockdevOnError 
> on_source_error,
> -  bool has_on_target_error, BlockdevOnError 
> on_target_error,
> -  bool has_unmap, bool unmap,
> -  Error **errp)
> +void qmp_drive_mirror(DriveMirror *arg, Error **errp)
>  {
>  BlockDriverState *bs;
>  BlockBackend *blk;
> @@ -3480,11 +3468,12 @@ void qmp_drive_mirror(const char *device, const char 
> *target,
>  int flags;
>  int64_t size;
>  int ret;
> +const char *format = arg->format;
>
> -blk = blk_by_name(device);
> +blk = blk_by_name(arg->device);
>  if (!blk) {
>  error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> -  "Device '%s' not found", device);
> +  "Device '%s' not found", arg->device);
>  return;
>  }
>
> @@ -3492,24 +3481,25 @@ void qmp_drive_mirror(const char *device, const char 
> *target,
>  aio_context_acquire(aio_context);
>
>  if (!blk_is_available(blk)) {
> -error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
> +error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, arg->device);
>  goto out;
>  }
>  bs = blk_bs(blk);
> -if (!has_mode) {
> -mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +if (!arg->has_mode) {
> +arg->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
>  }
>
> -if (!has_format) {
> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
> bs->drv->format_name;
> +if (!arg->has_format) {
> +format = (arg->mode == NEW_IMAGE_MODE_EXISTING
> +  ? NULL : bs->drv->format_name);
>  }
>
>  flags = bs->open_flags | BDRV_O_RDWR;
>  source = backing_bs(bs);
> -if (!source && sync == MIRROR_SYNC_MODE_TOP) {
> -sync = MIRROR_SYNC_MODE_FULL;
> +if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) {
> +arg->sync = MIRROR_SYNC_MODE_FULL;
>  }
> -if (sync == MIRROR_SYNC_MODE_NONE) {
> +if (arg->sync == MIRROR_SYNC_MODE_NONE) {
>  source = bs;
>  }
>
> @@ -3519,18 +3509,18 @@ void qmp_drive_mirror(const char *device, const char 
> *target,
>  goto out;
>  }
>
> -if (has_replaces) {
> +if (arg->has_replaces) {
>  BlockDriverState *to_replace_bs;
>  AioContext *replace_aio_context;
>  int64_t replace_size;
>
> -if (!has_node_name) {
> +if

[Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit

2016-06-14 Thread Denis V. Lunev

There is no need to scan allocation tables if we have mark_all_dirty flag
set. Just mark it all dirty.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 797659d..c7b3639 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 BlockDriverState *base = s->base;
 BlockDriverState *bs = blk_bs(s->common.blk);
 BlockDriverState *target_bs = blk_bs(s->target);
-bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs);
 uint64_t last_pause_ns;
 int ret, n;
 
 end = s->bdev_length / BDRV_SECTOR_SIZE;
 
+if (base == NULL && !bdrv_has_zero_init(target_bs)) {
+bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
+return 0;
+}
+
 last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
@@ -537,7 +541,7 @@ static int mirror_dirty_init(MirrorBlockJob *s)
 }
 
 assert(n > 0);
-if (ret == 1 || mark_all_dirty) {
+if (ret == 1) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
 }
 sector_num += n;
-- 
2.5.0

Re: [Qemu-block] [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit

2016-06-14 Thread Kevin Wolf

Am 14.06.2016 um 16:47 hat Eric Blake geschrieben:
> On 06/14/2016 02:05 AM, Kevin Wolf wrote:
> 
>   static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>  +/* Inherit all limits except for request_alignment */
>  +int request_alignment = bs->bl.request_alignment;
>  +
>   bs->bl = bs->file->bs->bl;
>  +bs->bl.request_alignment = request_alignment;
> >>
> >> Any ideas on how to fix the test, then?  Have two blkdebug devices
> >> nested atop one another, since those are the devices where we can
> >> explicitly override alignment?
> > 
> > Interesting idea. Maybe that's a good option if it works.
> > 
> >> (normally, you'd _expect_ the chain to
> >> inherit the worst-case alignment of all BDS in the chain, so blkdebug is
> >> the way around it).
> > 
> > Actually, I think there are two cases.
> > 
> > The first one is that you get a request and want to know what to do with
> > it. Here you don't want to inherit the worst-case alignment. You'd
> > rather want to enforce alignment only when it is actually needed. For
> > example, if you have a cache=none backing file with 4k alignment, but a
> > cache=writeback overlay, and you don't actually access the backing file
> > with this request, it would be wasteful to align the request. You also
> > don't really know that a driver will issue a misaligned request (or any
> > request at all) to the lower layer just because it got called with one.
> > 
> > The second case is when you want to issue a request. For example, let's
> > take the qcow2 cache here, which has 64k cached and needs to update a
> > few bytes. Currently it always writes the full 64k (and with 1 MB
> > clusters a full megabyte), but what it really should do is consider the
> > worst-case alignment.
> > 
> > This is comparable to the difference between bdrv_opt_mem_align(), which
> > is used in qemu_blockalign() where we want to create a buffer that works
> > even in the worst case, and bdrv_min_mem_align(), which is used in
> > bdrv_qiov_is_aligned() in order to determine whether we must align an
> > existing request.
> > 
> >> That's the only thing left before I repost the series, so I may just
> >> post the last patch as RFC and play with it a bit more...
> > 
> > And in the light of the above, maybe the solution is as easy as changing
> > raw_refresh_limits() to set bs->bl.request_alignment = 1.
> 
> Or maybe split the main bdrv_refresh_limits() into two pieces: one that
> merges limits from one BDS into another (the limits that are worth
> merging, and in the correct direction: opt merges to MAX, max merges to
> MIN_NON_ZERO, request_alignment is not merged), the other that calls
> merge(bs, bs->file->bs); then have raw_refresh_limits() also use the
> common merge functionality rather than straight copy.  Testing that
> approach now...

So you don't agree with what I said above?

I think that block drivers should only set limits that they require for
themselves. The block layer bdrv_refresh_limits() function can then
merge things where needed; drivers shouldn't be involved there.

Kevin


pgpceb24nIaLM.pgp
Description: PGP signature

[Qemu-block] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp

2016-06-14 Thread Denis V. Lunev

Properly cook MirrorOp initialization/deinitialization. The field is not
yet used actually.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d8be80a..7471211 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -73,6 +73,7 @@ typedef struct MirrorOp {
 QEMUIOVector qiov;
 int64_t sector_num;
 int nb_sectors;
+void *buf;
 } MirrorOp;
 
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -100,24 +101,28 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 s->in_flight--;
 s->sectors_in_flight -= op->nb_sectors;
 iov = op->qiov.iov;
-for (i = 0; i < op->qiov.niov; i++) {
-MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
-QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
-s->buf_free_count++;
-}
 
-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-chunk_num = op->sector_num / sectors_per_chunk;
-nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
-bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
-if (ret >= 0) {
-if (s->cow_bitmap) {
-bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
+if (op->buf == NULL) {
+for (i = 0; i < op->qiov.niov; i++) {
+MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
+QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
+s->buf_free_count++;
+}
+
+sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+chunk_num = op->sector_num / sectors_per_chunk;
+nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
+bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
+if (ret >= 0) {
+if (s->cow_bitmap) {
+bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
+}
+s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
 }
-s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
 }
 
 qemu_iovec_destroy(>qiov);
+g_free(op->buf);
 g_free(op);
 
 if (s->waiting_for_io) {
@@ -255,6 +260,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 op->s = s;
 op->sector_num = sector_num;
 op->nb_sectors = nb_sectors;
+op->buf = NULL;
 
 /* Now make a QEMUIOVector taking enough granularity-sized chunks
  * from s->buf_free.
-- 
2.5.0

[Qemu-block] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run

2016-06-14 Thread Denis V. Lunev

The code inside the helper will be extended in the next patch. mirror_run
itself is overbloated at the moment.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy
CC: Stefan Hajnoczi 
CC: Fam Zheng 
CC: Kevin Wolf 
CC: Max Reitz 
CC: Jeff Cody 
CC: Eric Blake 
---
 block/mirror.c | 83 ++
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 3760e29..797659d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -501,19 +501,62 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_unref(src);
 }
 
+static int mirror_dirty_init(MirrorBlockJob *s)
+{
+int64_t sector_num, end;
+BlockDriverState *base = s->base;
+BlockDriverState *bs = blk_bs(s->common.blk);
+BlockDriverState *target_bs = blk_bs(s->target);
+bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs);
+uint64_t last_pause_ns;
+int ret, n;
+
+end = s->bdev_length / BDRV_SECTOR_SIZE;
+
+last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+/* First part, loop on the sectors and initialize the dirty bitmap.  */
+for (sector_num = 0; sector_num < end; ) {
+/* Just to make sure we are not exceeding int limit. */
+int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
+ end - sector_num);
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+if (now - last_pause_ns > SLICE_TIME) {
+last_pause_ns = now;
+block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
+}
+
+if (block_job_is_cancelled(>common)) {
+return 0;
+}
+
+ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, );
+if (ret < 0) {
+return ret;
+}
+
+assert(n > 0);
+if (ret == 1 || mark_all_dirty) {
+bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
+}
+sector_num += n;
+}
+return 0;
+}
+
 static void coroutine_fn mirror_run(void *opaque)
 {
 MirrorBlockJob *s = opaque;
 MirrorExitData *data;
 BlockDriverState *bs = blk_bs(s->common.blk);
 BlockDriverState *target_bs = blk_bs(s->target);
-int64_t sector_num, end, length;
+int64_t length;
 uint64_t last_pause_ns;
 BlockDriverInfo bdi;
 char backing_filename[2]; /* we only need 2 characters because we are only
  checking for a NULL string */
 int ret = 0;
-int n;
 int target_cluster_size = BDRV_SECTOR_SIZE;
 
 if (block_job_is_cancelled(>common)) {
@@ -555,7 +598,6 @@ static void coroutine_fn mirror_run(void *opaque)
 s->target_cluster_sectors = target_cluster_size >> BDRV_SECTOR_BITS;
 s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov);
 
-end = s->bdev_length / BDRV_SECTOR_SIZE;
 s->buf = qemu_try_blockalign(bs, s->buf_size);
 if (s->buf == NULL) {
 ret = -ENOMEM;
@@ -564,41 +606,14 @@ static void coroutine_fn mirror_run(void *opaque)
 
 mirror_free_init(s);
 
-last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 if (!s->is_none_mode) {
-/* First part, loop on the sectors and initialize the dirty bitmap.  */
-BlockDriverState *base = s->base;
-bool mark_all_dirty = s->base == NULL && 
!bdrv_has_zero_init(target_bs);
-
-for (sector_num = 0; sector_num < end; ) {
-/* Just to make sure we are not exceeding int limit. */
-int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
- end - sector_num);
-int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-
-if (now - last_pause_ns > SLICE_TIME) {
-last_pause_ns = now;
-block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
-}
-
-if (block_job_is_cancelled(>common)) {
-goto immediate_exit;
-}
-
-ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, 
);
-
-if (ret < 0) {
-goto immediate_exit;
-}
-
-assert(n > 0);
-if (ret == 1 || mark_all_dirty) {
-bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
-}
-sector_num += n;
+ret = mirror_dirty_init(s);
+if (ret < 0 || block_job_is_cancelled(>common)) {
+goto immediate_exit;
 }
 }
 
+last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 bdrv_dirty_iter_init(s->dirty_bitmap, >hbi);
 for (;;) {
 uint64_t delay_ns = 0;
-- 
2.5.0

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Alex Bligh

On 14 Jun 2016, at 14:32, Paolo Bonzini  wrote:

> 
> On 13/06/2016 23:41, Alex Bligh wrote:
>> That's one of the reasons that there is a proposal to add
>> STRUCTURED_READ to the spec (although I still haven't had time to
>> implement that for qemu), so that we have a newer approach that allows
>> for proper error handling without ambiguity on whether bogus bytes must
>> be sent on a failed read.  But you'd have to convince me that ALL
>> existing NBD server and client implementations expect to handle a read
>> error without read payload, otherwise, I will stick with the notion that
>> the current spec wording is correct, and that read errors CANNOT be
>> gracefully recovered from unless BOTH sides transfer (possibly bogus)
>> bytes along with the error message, and which is why BOTH sides of the
>> protocol are warned that read errors usually result in a disconnection
>> rather than clean continuation, without the addition of STRUCTURED_READ.
> 
> I suspect that there are exactly two client implementations,

My understanding is that there are more than 2 client implementations.
A quick google found at least one BSD client. I bet read error handling
is a mess in all of them.

> namely
> Linux and QEMU's, and both do the right thing.

This depends what you mean by 'right'. Both appear to be non-compliant
with the standard.

Note the standard is not defined by the client implementation, but
by the protocol document.

IMHO the 'right thing' is what is in the spec. Servers can't send an
error in any other way if they don't buffer the entire read first, as the
read may error towards the end.

To illustrate the problem, look consider what qemu itself would do as
a server if it can't buffer the entire read issued to it.

> What servers do doesn't matter, if all the clients agree.

The spec originally was not clear on how errors on reads should be
handled, leading to any read causing a protocol drop. The spec is
now clear. Unfortunately it is not possible to make a back compatible
fix. Hence the real fix here is to implement structured replies,
which is what Eric and I have been working on.

-- 
Alex Bligh

Re: [Qemu-block] [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit

2016-06-14 Thread Eric Blake

On 06/14/2016 02:05 AM, Kevin Wolf wrote:

  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
  {
 +/* Inherit all limits except for request_alignment */
 +int request_alignment = bs->bl.request_alignment;
 +
  bs->bl = bs->file->bs->bl;
 +bs->bl.request_alignment = request_alignment;
>>
>> Any ideas on how to fix the test, then?  Have two blkdebug devices
>> nested atop one another, since those are the devices where we can
>> explicitly override alignment?
> 
> Interesting idea. Maybe that's a good option if it works.
> 
>> (normally, you'd _expect_ the chain to
>> inherit the worst-case alignment of all BDS in the chain, so blkdebug is
>> the way around it).
> 
> Actually, I think there are two cases.
> 
> The first one is that you get a request and want to know what to do with
> it. Here you don't want to inherit the worst-case alignment. You'd
> rather want to enforce alignment only when it is actually needed. For
> example, if you have a cache=none backing file with 4k alignment, but a
> cache=writeback overlay, and you don't actually access the backing file
> with this request, it would be wasteful to align the request. You also
> don't really know that a driver will issue a misaligned request (or any
> request at all) to the lower layer just because it got called with one.
> 
> The second case is when you want to issue a request. For example, let's
> take the qcow2 cache here, which has 64k cached and needs to update a
> few bytes. Currently it always writes the full 64k (and with 1 MB
> clusters a full megabyte), but what it really should do is consider the
> worst-case alignment.
> 
> This is comparable to the difference between bdrv_opt_mem_align(), which
> is used in qemu_blockalign() where we want to create a buffer that works
> even in the worst case, and bdrv_min_mem_align(), which is used in
> bdrv_qiov_is_aligned() in order to determine whether we must align an
> existing request.
> 
>> That's the only thing left before I repost the series, so I may just
>> post the last patch as RFC and play with it a bit more...
> 
> And in the light of the above, maybe the solution is as easy as changing
> raw_refresh_limits() to set bs->bl.request_alignment = 1.

Or maybe split the main bdrv_refresh_limits() into two pieces: one that
merges limits from one BDS into another (the limits that are worth
merging, and in the correct direction: opt merges to MAX, max merges to
MIN_NON_ZERO, request_alignment is not merged), the other that calls
merge(bs, bs->file->bs); then have raw_refresh_limits() also use the
common merge functionality rather than straight copy.  Testing that
approach now...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size

2016-06-14 Thread Max Reitz

On 18.05.2016 07:53, Fam Zheng wrote:
> Otherwise the job is orphaned and block_job_cancel_sync in
> bdrv_close_all() when quiting will hang.
> 
> A simple reproducer is running blockdev-backup from null-co:// to
> null-co://.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Fam Zheng 
> ---
>  block/backup.c | 34 ++
>  1 file changed, 18 insertions(+), 16 deletions(-)

Sorry for having waited so long (I was thinking that maybe Jeff wanted
to take this patch), but now the patch no longer applies and I don't
feel comfortable with just fixing it up myself; git-backport-diff tells
me the required changes are "[0015] [FC]".

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v5 03/11] qom: support arbitrary non-scalar properties with -object

2016-06-14 Thread Daniel P. Berrange

On Thu, Jun 09, 2016 at 04:43:35PM +0200, Markus Armbruster wrote:
> "Daniel P. Berrange"  writes:
> 
> > The current -object command line syntax only allows for
> > creation of objects with scalar properties, or a list
> > with a fixed scalar element type. Objects which have
> > properties that are represented as structs in the QAPI
> > schema cannot be created using -object.
> >
> > This is a design limitation of the way the OptsVisitor
> > is written. It simply iterates over the QemuOpts values
> > as a flat list. The support for lists is enabled by
> > allowing the same key to be repeated in the opts string.
> >
> > It is not practical to extend the OptsVisitor to support
> > more complex data structures while also maintaining
> > the existing list handling behaviour that is relied upon
> > by other areas of QEMU.
> 
> It should be practical to create a new option input visitor that parses
> command line syntax straight into a QAPI visit, without the list magic,
> and with fewer or no restrictions.  Then we could start defining complex
> command line arguments as QAPI types, parse them straight into generated
> QAPI types, and dispense with the QDict wrangling.

Funnily enough I did actually start out trying to implement pretty
much exactly that. I found that in my new opts visitor, I was essentially
parsing the QemuOpts into QDict/QLists to handle the recursion from the
visitor. At this point I was starting to duplicate much code from QDict
and QmpInputVisitor, so it abandoned that approach and went for the more
general QemuOpts -> QDict crumping instead, since it the goal to be
achieved with simpler code overall.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-block] [PATCH v2] rbd:change error_setg() to error_setg_errno()

2016-06-14 Thread Max Reitz

On 09.05.2016 09:51, Vikhyat Umrao wrote:
> Ceph RBD block driver does not use error_setg_errno() where
> it is possible to use. This patch replaces error_setg()
> from error_setg_errno().
> 
> Signed-off-by: Vikhyat Umrao 
> ---
>  block/rbd.c | 38 +++---
>  1 file changed, 23 insertions(+), 15 deletions(-)

Thanks, applied to my block tree:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev

2016-06-14 Thread Eric Blake

On 06/14/2016 07:32 AM, Kevin Wolf wrote:
> The raw-posix block driver actually supports byte-aligned requests now
> on non-O_DIRECT images, like it already (and previously incorrectly)
> claimed in bs->request_alignment.
> 
> For some block drivers this means that a RMW cycle can be avoided when
> they write sub-sector metadata e.g. for cluster allocation.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/linux-aio.c |  7 ++-
>  block/raw-aio.h   |  3 +--
>  block/raw-posix.c | 43 +++
>  3 files changed, 26 insertions(+), 27 deletions(-)
> 

> -static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t 
> sector_num,
> - int nb_sectors, QEMUIOVector *qiov)
> +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
> +  uint64_t bytes, QEMUIOVector *qiov,
> +  int flags)
>  {
> -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_READ);
> +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);
>  }
>  
> -static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t 
> sector_num,
> -  int nb_sectors, QEMUIOVector *qiov)
> +static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
> +   uint64_t bytes, QEMUIOVector *qiov,
> +   int flags)
>  {
> -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_WRITE);
> +assert(flags == 0);
> +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
>  }

Looks odd to assert !flags on pwritev but not on preadv.  Minor enough
that you could fix on pull request.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 4/6] raw-posix: Switch to bdrv_co_* interfaces

2016-06-14 Thread Eric Blake

On 06/14/2016 07:32 AM, Kevin Wolf wrote:
> In order to use the modern byte-based .bdrv_co_preadv/pwritev()
> interface, this patch switches raw-posix to coroutine-based interfaces
> as a first step. In terms of semantics and performance, it doesn't make
> a difference with the existing code whether we go from a coroutine to a
> callback-based interface already in block/io.c or only in linux-aio.c
> 
> As there have been concerns in the past that this change may be a step
> in the wrong direction with respect to a possible AIO fast path, the
> old callback-based interface for linux-aio is left around and can be
> reactivated when a fast path (e.g. directly from virtio-blk dataplane,
> bypassing the whole block layer) is implemented.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/linux-aio.c | 87 
> +--
>  block/raw-aio.h   |  4 +++
>  block/raw-posix.c | 59 +
>  3 files changed, 96 insertions(+), 54 deletions(-)

Reviewed-by: Eric Blake 


> @@ -1957,8 +1952,8 @@ BlockDriver bdrv_file = {
>  .bdrv_co_get_block_status = raw_co_get_block_status,
>  .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
>  
> -.bdrv_aio_readv = raw_aio_readv,
> -.bdrv_aio_writev = raw_aio_writev,
> +.bdrv_co_readv  = raw_co_readv,
> +.bdrv_co_writev = raw_co_writev,
>  .bdrv_aio_flush = raw_aio_flush,
>  .bdrv_aio_discard = raw_aio_discard,

The rest of this chunk doesn't try to align '='. Not worth a respin, but
something you may want to clean up on pull request.

>  .bdrv_refresh_limits = raw_refresh_limits,
> @@ -2405,8 +2400,8 @@ static BlockDriver bdrv_host_device = {
>  .create_opts = _create_opts,
>  .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>  
> -.bdrv_aio_readv  = raw_aio_readv,
> -.bdrv_aio_writev = raw_aio_writev,
> +.bdrv_co_readv  = raw_co_readv,
> +.bdrv_co_writev = raw_co_writev,
>  .bdrv_aio_flush  = raw_aio_flush,
>  .bdrv_aio_discard   = hdev_aio_discard,
>  .bdrv_refresh_limits = raw_refresh_limits,

and this hunk is also inconsistent, but in a different way.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v1 0/6] Report format specific info for LUKS block driver

2016-06-14 Thread Daniel P. Berrange

On Tue, Jun 14, 2016 at 03:56:30PM +0200, Max Reitz wrote:
> On 07.06.2016 12:11, Daniel P. Berrange wrote:
> > The 'qemu-img info' tool has ability to print format specific
> > information, eg with qcow2 it reports two extra items:
> > 
> >   $ qemu-img info ~/VirtualMachines/demo.qcow2
> >   image: /home/berrange/VirtualMachines/demo.qcow2
> >   file format: qcow2
> >   virtual size: 3.0G (3221225472 bytes)
> >   disk size: 140K
> >   cluster_size: 65536
> >   Format specific information:
> >   compat: 0.10
> >   refcount bits: 16
> > 
> > 
> > This is not currently wired up for the LUKS driver. This patch
> > series adds that support so that we can report useful data about
> > the LUKS volume such as the crypto algorithm choices, key slot
> > usage and other volume metadata.
> > 
> > The first patch extends the crypto API to allow querying of the
> > format specific metadata
> > 
> > The second patches extends the block API to allow the LUKS driver
> > to report the format specific metadata.
> > 
> > $ qemu-img info ~/VirtualMachines/demo.luks
> > image: /home/berrange/VirtualMachines/demo.luks
> > file format: luks
> > virtual size: 98M (102760448 bytes)
> > disk size: 100M
> > encrypted: yes
> > Format specific information:
> > cipher-alg: aes-128
> > cipher-mode: xts
> > ivgen-alg: plain64
> > hash-alg: sha1
> > payload-offset: 2097152
> > master-key-iters: 142375
> > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
> > slots:
> > [0]:
> > active: true
> > iters: 572706
> > stripes: 4000
> > key-offset: 8
> > [1]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 264
> > [2]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 520
> > [3]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 776
> > [4]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 1032
> > [5]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 1288
> > [6]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 1544
> > [7]:
> > active: false
> > iters: 0
> > stripes: 4000
> > key-offset: 1800
> > 
> > The remaining 4 patches are improvements to QAPI and the core
> > block layer to fix a problem whereby struct fields are output
> > in (apparently) random ordering. This is because the QAPI type
> > is converted into a QObject for pretty-printing, thus throwing
> > away any struct field ordering information.
> > 
> > To address this I created a new TextOutputVisitor which can
> > directly pretty-print QAPI types. I then changed the code
> > generator to create qapi_stringify_TYPENAME() methods for
> > all QAPI types. Finally I changed the block layer over to
> > use this stringify approach instead.
> 
> General nagging: This new approach no longer replaces dashes with spaces
> in the key names. Is it worth doing something about this?

Opps, didn't notice that. Should be pretty easy to deal with that
in the visitor.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-block] [PATCH v2 3/6] block: Prepare bdrv_aligned_pwritev() for byte-aligned requests

2016-06-14 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index e75bce2..b261cc6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1249,11 +1249,9 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 bool waited;
 int ret;
 
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+int64_t start_sector = offset >> BDRV_SECTOR_BITS;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -1278,22 +1276,21 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 /* Do nothing, write notifier decided to fail this request */
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
 bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO);
-ret = bdrv_co_do_pwrite_zeroes(bs, sector_num << BDRV_SECTOR_BITS,
-   nb_sectors << BDRV_SECTOR_BITS, flags);
+ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags);
 } else {
 bdrv_debug_event(bs, BLKDBG_PWRITEV);
 ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags);
 }
 bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
 
-bdrv_set_dirty(bs, sector_num, nb_sectors);
+bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
 
 if (bs->wr_highest_offset < offset + bytes) {
 bs->wr_highest_offset = offset + bytes;
 }
 
 if (ret >= 0) {
-bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
+bs->total_sectors = MAX(bs->total_sectors, end_sector);
 }
 
 return ret;
-- 
1.8.3.1

[Qemu-block] [PATCH v2 1/6] block: Byte-based bdrv_co_do_copy_on_readv()

2016-06-14 Thread Kevin Wolf

In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c| 63 +++
 block/mirror.c| 10 
 include/block/block.h | 10 +---
 trace-events  |  2 +-
 4 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5b2017f..b6a2c80 100644
--- a/block/io.c
+++ b/block/io.c
@@ -404,12 +404,12 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 }
 
 /**
- * Round a region to cluster boundaries
+ * Round a region to cluster boundaries (sector-based)
  */
-void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors)
+void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors,
+int64_t *cluster_sector_num,
+int *cluster_nb_sectors)
 {
 BlockDriverInfo bdi;
 
@@ -424,6 +424,26 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 }
 }
 
+/**
+ * Round a region to cluster boundaries
+ */
+void bdrv_round_to_clusters(BlockDriverState *bs,
+int64_t offset, unsigned int bytes,
+int64_t *cluster_offset,
+unsigned int *cluster_bytes)
+{
+BlockDriverInfo bdi;
+
+if (bdrv_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
+*cluster_offset = offset;
+*cluster_bytes = bytes;
+} else {
+int64_t c = bdi.cluster_size;
+*cluster_offset = QEMU_ALIGN_DOWN(offset, c);
+*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c);
+}
+}
+
 static int bdrv_get_cluster_size(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
@@ -865,7 +885,7 @@ emulate_flags:
 }
 
 static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov)
 {
 /* Perform I/O through a temporary buffer so that users who scribble over
  * their read buffer while the operation is in progress do not end up
@@ -877,21 +897,20 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
-int64_t cluster_sector_num;
-int cluster_nb_sectors;
+int64_t cluster_offset;
+unsigned int cluster_bytes;
 size_t skip_bytes;
 int ret;
 
 /* Cover entire cluster so no additional backing file I/O is required when
  * allocating cluster in the image file.
  */
-bdrv_round_to_clusters(bs, sector_num, nb_sectors,
-   _sector_num, _nb_sectors);
+bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes);
 
-trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
-   cluster_sector_num, cluster_nb_sectors);
+trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
+   cluster_offset, cluster_bytes);
 
-iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
+iov.iov_len = cluster_bytes;
 iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
 if (bounce_buffer == NULL) {
 ret = -ENOMEM;
@@ -900,8 +919,7 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 
 qemu_iovec_init_external(_qiov, , 1);
 
-ret = bdrv_driver_preadv(bs, cluster_sector_num * BDRV_SECTOR_SIZE,
- cluster_nb_sectors * BDRV_SECTOR_SIZE,
+ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes,
  _qiov, 0);
 if (ret < 0) {
 goto err;
@@ -909,16 +927,12 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 
 if (drv->bdrv_co_pwrite_zeroes &&
 buffer_is_zero(bounce_buffer, iov.iov_len)) {
-ret = bdrv_co_do_pwrite_zeroes(bs,
-   cluster_sector_num * BDRV_SECTOR_SIZE,
-   cluster_nb_sectors * BDRV_SECTOR_SIZE,
-   0);
+ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
 } else {
 /* This does not change the data on the disk, it is not necessary
  * to flush even in cache=writethrough mode.
  */
-ret = bdrv_driver_pwritev(bs, cluster_sector_num * BDRV_SECTOR_SIZE,
-  cluster_nb_sectors * BDRV_SECTOR_SIZE,
+ret =

Re: [Qemu-block] [PATCH] qemu-img bench: Fix uninitialised writethrough mode

2016-06-14 Thread Max Reitz

On 14.06.2016 11:33, Kevin Wolf wrote:
> If no -t option is specified, bool writethrough stayed uninitialised.
> Initialise it as false, which makes cache=writeback the default cache
> mode.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v2 4/6] raw-posix: Switch to bdrv_co_* interfaces

2016-06-14 Thread Kevin Wolf

In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c

As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.

Signed-off-by: Kevin Wolf 
---
 block/linux-aio.c | 87 +--
 block/raw-aio.h   |  4 +++
 block/raw-posix.c | 59 +
 3 files changed, 96 insertions(+), 54 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 294b2bf..74d9b33 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -11,8 +11,10 @@
 #include "qemu-common.h"
 #include "block/aio.h"
 #include "qemu/queue.h"
+#include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
+#include "qemu/coroutine.h"
 
 #include 
 
@@ -30,6 +32,7 @@
 
 struct qemu_laiocb {
 BlockAIOCB common;
+Coroutine *co;
 LinuxAioState *ctx;
 struct iocb iocb;
 ssize_t ret;
@@ -88,9 +91,14 @@ static void qemu_laio_process_completion(struct qemu_laiocb 
*laiocb)
 }
 }
 }
-laiocb->common.cb(laiocb->common.opaque, ret);
 
-qemu_aio_unref(laiocb);
+laiocb->ret = ret;
+if (laiocb->co) {
+qemu_coroutine_enter(laiocb->co, NULL);
+} else {
+laiocb->common.cb(laiocb->common.opaque, ret);
+qemu_aio_unref(laiocb);
+}
 }
 
 /* The completion BH fetches completed I/O requests and invokes their
@@ -232,22 +240,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState 
*s)
 }
 }
 
-BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
-BlockCompletionFunc *cb, void *opaque, int type)
+static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
+  int type)
 {
-struct qemu_laiocb *laiocb;
-struct iocb *iocbs;
-off_t offset = sector_num * 512;
-
-laiocb = qemu_aio_get(_aiocb_info, bs, cb, opaque);
-laiocb->nbytes = nb_sectors * 512;
-laiocb->ctx = s;
-laiocb->ret = -EINPROGRESS;
-laiocb->is_read = (type == QEMU_AIO_READ);
-laiocb->qiov = qiov;
-
-iocbs = >iocb;
+LinuxAioState *s = laiocb->ctx;
+struct iocb *iocbs = >iocb;
+QEMUIOVector *qiov = laiocb->qiov;
 
 switch (type) {
 case QEMU_AIO_WRITE:
@@ -260,7 +258,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState 
*s, int fd,
 default:
 fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
 __func__, type);
-goto out_free_aiocb;
+return -EIO;
 }
 io_set_eventfd(>iocb, event_notifier_get_fd(>e));
 
@@ -270,11 +268,56 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
 (!s->io_q.plugged || s->io_q.n >= MAX_QUEUED_IO)) {
 ioq_submit(s);
 }
-return >common;
 
-out_free_aiocb:
-qemu_aio_unref(laiocb);
-return NULL;
+return 0;
+}
+
+int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
+int64_t sector_num, QEMUIOVector *qiov,
+int nb_sectors, int type)
+{
+off_t offset = sector_num * BDRV_SECTOR_SIZE;
+int ret;
+
+struct qemu_laiocb laiocb = {
+.co = qemu_coroutine_self(),
+.nbytes = nb_sectors * BDRV_SECTOR_SIZE,
+.ctx= s,
+.is_read= (type == QEMU_AIO_READ),
+.qiov   = qiov,
+};
+
+ret = laio_do_submit(fd, , offset, type);
+if (ret < 0) {
+return ret;
+}
+
+qemu_coroutine_yield();
+return laiocb.ret;
+}
+
+BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
+int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+BlockCompletionFunc *cb, void *opaque, int type)
+{
+struct qemu_laiocb *laiocb;
+off_t offset = sector_num * BDRV_SECTOR_SIZE;
+int ret;
+
+laiocb = qemu_aio_get(_aiocb_info, bs, cb, opaque);
+laiocb->nbytes = nb_sectors * BDRV_SECTOR_SIZE;
+laiocb->ctx = s;
+laiocb->ret = -EINPROGRESS;
+laiocb->is_read = (type == QEMU_AIO_READ);
+laiocb->qiov = qiov;
+
+ret = laio_do_submit(fd, laiocb, offset, type);
+if (ret < 0) {
+qemu_aio_unref(laiocb);
+return NULL;
+}
+
+return >common;
 }
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
diff --git a/block/raw-aio.h b/block/raw-aio.h
index

[Qemu-block] [PATCH v2 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev

2016-06-14 Thread Kevin Wolf

The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.

For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.

Signed-off-by: Kevin Wolf 
---
 block/linux-aio.c |  7 ++-
 block/raw-aio.h   |  3 +--
 block/raw-posix.c | 43 +++
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 74d9b33..e468960 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -273,15 +273,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 }
 
 int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov,
-int nb_sectors, int type)
+uint64_t offset, QEMUIOVector *qiov, int type)
 {
-off_t offset = sector_num * BDRV_SECTOR_SIZE;
 int ret;
-
 struct qemu_laiocb laiocb = {
 .co = qemu_coroutine_self(),
-.nbytes = nb_sectors * BDRV_SECTOR_SIZE,
+.nbytes = qiov->size,
 .ctx= s,
 .is_read= (type == QEMU_AIO_READ),
 .qiov   = qiov,
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 03bbfba..a4cdbbf 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -40,8 +40,7 @@ typedef struct LinuxAioState LinuxAioState;
 LinuxAioState *laio_init(void);
 void laio_cleanup(LinuxAioState *s);
 int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
-int64_t sector_num, QEMUIOVector *qiov,
-int nb_sectors, int type);
+uint64_t offset, QEMUIOVector *qiov, int type);
 BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque, int type);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index cb98769..aacf132 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1325,8 +1325,8 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, int 
fd,
 return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
-static int coroutine_fn raw_co_rw(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, QEMUIOVector *qiov, int type)
+static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov, int 
type)
 {
 BDRVRawState *s = bs->opaque;
 
@@ -1344,26 +1344,28 @@ static int coroutine_fn raw_co_rw(BlockDriverState *bs, 
int64_t sector_num,
 type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
 } else if (s->use_aio) {
-return laio_co_submit(bs, s->aio_ctx, s->fd, sector_num, qiov,
-  nb_sectors, type);
+assert(qiov->size == bytes);
+return laio_co_submit(bs, s->aio_ctx, s->fd, offset, qiov, type);
 #endif
 }
 }
 
-return paio_submit_co(bs, s->fd, sector_num * BDRV_SECTOR_SIZE, qiov,
-  nb_sectors * BDRV_SECTOR_SIZE, type);
+return paio_submit_co(bs, s->fd, offset, qiov, bytes, type);
 }
 
-static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, QEMUIOVector *qiov)
+static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, QEMUIOVector *qiov,
+  int flags)
 {
-return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_READ);
+return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);
 }
 
-static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, QEMUIOVector *qiov)
+static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov,
+   int flags)
 {
-return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_WRITE);
+assert(flags == 0);
+return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
 static void raw_aio_plug(BlockDriverState *bs)
@@ -1952,8 +1954,8 @@ BlockDriver bdrv_file = {
 .bdrv_co_get_block_status = raw_co_get_block_status,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
 
-.bdrv_co_readv  = raw_co_readv,
-.bdrv_co_writev = raw_co_writev,
+.bdrv_co_preadv = raw_co_preadv,
+.bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_aio_flush = raw_aio_flush,
 .bdrv_aio_discard = raw_aio_discard,

Re: [Qemu-block] [Qemu-devel] [PATCH v5 02/11] qapi: allow QmpInputVisitor to auto-cast types

2016-06-14 Thread Daniel P. Berrange

On Thu, Jun 09, 2016 at 04:03:50PM +0200, Markus Armbruster wrote:
> "Daniel P. Berrange"  writes:
> 
> > Currently the QmpInputVisitor assumes that all scalar
> > values are directly represented as their final types.
> > ie it assumes an 'int' is using QInt, and a 'bool' is
> > using QBool.
> >
> > This extends it so that QString is optionally permitted
> > for any of the non-string scalar types. This behaviour
> > is turned on by requesting the 'autocast' flag in the
> > constructor.
> >
> > This makes it possible to use QmpInputVisitor with a
> > QDict produced from QemuOpts, where everything is in
> > string format.
> 
> We're still digging.
> 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  docs/qapi-code-gen.txt |   2 +-
> >  include/qapi/qmp-input-visitor.h   |   6 +-
> >  qapi/opts-visitor.c|   1 +
> >  qapi/qmp-input-visitor.c   |  89 ++--
> >  qmp.c  |   2 +-
> >  qom/qom-qobject.c  |   2 +-
> >  replay/replay-input.c  |   2 +-
> >  scripts/qapi-commands.py   |   2 +-
> >  tests/check-qnull.c|   2 +-
> >  tests/test-qmp-commands.c  |   2 +-
> >  tests/test-qmp-input-strict.c  |   2 +-
> >  tests/test-qmp-input-visitor.c | 115 
> > -
> >  tests/test-visitor-serialization.c |   2 +-
> >  util/qemu-sockets.c|   2 +-
> >  14 files changed, 201 insertions(+), 30 deletions(-)


> > diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
> > index 4cf1cf8..00e4b1a 100644
> > --- a/qapi/opts-visitor.c
> > +++ b/qapi/opts-visitor.c
> > @@ -347,6 +347,7 @@ opts_type_bool(Visitor *v, const char *name, bool *obj, 
> > Error **errp)
> >  }
> >  
> >  if (opt->str) {
> > +/* Keep these values in sync with same code in qmp-input-visitor.c 
> > */
> 
> Also with parse_option_bool().  That's the canonical place, in fact.

except parse_option_bool() doesn't allow "yes", "no", "y", "n" as valid
values - it only permits "on" and "off" :-(  I guess I could make the
parse_option_bool() method non-static, make it accept these missing values,
and then call it from all places so we have guaranteed consistency.

> 
> >  if (strcmp(opt->str, "on") == 0 ||
> >  strcmp(opt->str, "yes") == 0 ||
> >  strcmp(opt->str, "y") == 0) {
> > diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c
> > index aea90a1..92023b1 100644
> > --- a/qapi/qmp-input-visitor.c
> > +++ b/qapi/qmp-input-visitor.c
> > @@ -20,6 +20,7 @@
> >  #include "qemu-common.h"
> >  #include "qapi/qmp/types.h"
> >  #include "qapi/qmp/qerror.h"
> > +#include "qemu/cutils.h"
> >  
> >  #define QIV_STACK_SIZE 1024
> >  
> > @@ -45,6 +46,7 @@ struct QmpInputVisitor
> >  
> >  /* True to reject parse in visit_end_struct() if unvisited keys 
> > remain. */
> >  bool strict;
> > +bool autocast;
> >  };
> >  
> >  static QmpInputVisitor *to_qiv(Visitor *v)
> > @@ -254,15 +256,25 @@ static void qmp_input_type_int64(Visitor *v, const 
> > char *name, int64_t *obj,
> >   Error **errp)
> >  {
> >  QmpInputVisitor *qiv = to_qiv(v);
> > -QInt *qint = qobject_to_qint(qmp_input_get_object(qiv, name, true));
> > +QObject *qobj = qmp_input_get_object(qiv, name, true);
> > +QInt *qint;
> > +QString *qstr;
> >  
> > -if (!qint) {
> > -error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
> > -   "integer");
> > +qint = qobject_to_qint(qobj);
> > +if (qint) {
> > +*obj = qint_get_int(qint);
> >  return;
> >  }
> >  
> > -*obj = qint_get_int(qint);
> > +qstr = qobject_to_qstring(qobj);
> > +if (qstr && qstr->string && qiv->autocast) {
> > +if (qemu_strtoll(qstr->string, NULL, 10, obj) == 0) {
> 
> In the commit message, you mentioned intended use: "with a QDict
> produced from QemuOpts, where everything is in string format."  I figure
> you mean opts with empty opts->list->desc[].  For those, we accept any
> key and parse all values as strings.
> 
> The idea with such QemuOpts is to *delay* parsing until we know how to
> parse.  The delay should be transparent to the user.  In particular,
> values should be parsed the same whether the parsing is delayed or not.
> 
> The canonical QemuOpts value parser is qemu_opt_parse().  It parses
> integers with parse_option_number().  That function parses like
> stroull(qstr->string, ..., 0).  Two differences:
> 
> * stroll() vs. strtoull(): they differ in ERANGE behavior.  This might
>   be tolerable, but I haven't thought it through.
> 
> * Base 0 vs 10: different syntax and semantics.  Oops.

Sigh, I originally had my code using strtoull() directly as the
parse_option_number() method does, but had to change it to make
checkpatch.pl stfu thus creating incosistency. This is a great

[Qemu-block] [PATCH v2 2/6] block: Prepare bdrv_aligned_preadv() for byte-aligned requests

2016-06-14 Thread Kevin Wolf

This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
requests. Note that this doesn't mean that such requests are actually
made. The caller still ensures that all requests are aligned to at least
512 bytes.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 block/io.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/block/io.c b/block/io.c
index b6a2c80..e75bce2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -963,11 +963,9 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 {
 int ret;
 
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
-
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert(is_power_of_2(align));
+assert((offset & (align - 1)) == 0);
+assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -987,9 +985,12 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 if (flags & BDRV_REQ_COPY_ON_READ) {
+int64_t start_sector = offset >> BDRV_SECTOR_BITS;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+unsigned int nb_sectors = end_sector - start_sector;
 int pnum;
 
-ret = bdrv_is_allocated(bs, sector_num, nb_sectors, );
+ret = bdrv_is_allocated(bs, start_sector, nb_sectors, );
 if (ret < 0) {
 goto out;
 }
@@ -1005,28 +1006,24 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else {
 /* Read zeros after EOF */
-int64_t total_sectors, max_nb_sectors;
+int64_t total_bytes, max_bytes;
 
-total_sectors = bdrv_nb_sectors(bs);
-if (total_sectors < 0) {
-ret = total_sectors;
+total_bytes = bdrv_getlength(bs);
+if (total_bytes < 0) {
+ret = total_bytes;
 goto out;
 }
 
-max_nb_sectors = ROUND_UP(MAX(0, total_sectors - sector_num),
-  align >> BDRV_SECTOR_BITS);
-if (nb_sectors < max_nb_sectors) {
+max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
+if (bytes < max_bytes) {
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
-} else if (max_nb_sectors > 0) {
+} else if (max_bytes > 0) {
 QEMUIOVector local_qiov;
 
 qemu_iovec_init(_qiov, qiov->niov);
-qemu_iovec_concat(_qiov, qiov, 0,
-  max_nb_sectors * BDRV_SECTOR_SIZE);
+qemu_iovec_concat(_qiov, qiov, 0, max_bytes);
 
-ret = bdrv_driver_preadv(bs, offset,
- max_nb_sectors * BDRV_SECTOR_SIZE,
- _qiov, 0);
+ret = bdrv_driver_preadv(bs, offset, max_bytes, _qiov, 0);
 
 qemu_iovec_destroy(_qiov);
 } else {
@@ -1034,11 +1031,10 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 /* Reading beyond end of file is supposed to produce zeroes */
-if (ret == 0 && total_sectors < sector_num + nb_sectors) {
-uint64_t offset = MAX(0, total_sectors - sector_num);
-uint64_t bytes = (sector_num + nb_sectors - offset) *
-  BDRV_SECTOR_SIZE;
-qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
+if (ret == 0 && total_bytes < offset + bytes) {
+uint64_t zero_offset = MAX(0, total_bytes - offset);
+uint64_t zero_bytes = offset + bytes - zero_offset;
+qemu_iovec_memset(qiov, zero_offset, 0, zero_bytes);
 }
 }
 
-- 
1.8.3.1

Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Paolo Bonzini



On 13/06/2016 23:41, Alex Bligh wrote:
> That's one of the reasons that there is a proposal to add
> STRUCTURED_READ to the spec (although I still haven't had time to
> implement that for qemu), so that we have a newer approach that allows
> for proper error handling without ambiguity on whether bogus bytes must
> be sent on a failed read.  But you'd have to convince me that ALL
> existing NBD server and client implementations expect to handle a read
> error without read payload, otherwise, I will stick with the notion that
> the current spec wording is correct, and that read errors CANNOT be
> gracefully recovered from unless BOTH sides transfer (possibly bogus)
> bytes along with the error message, and which is why BOTH sides of the
> protocol are warned that read errors usually result in a disconnection
> rather than clean continuation, without the addition of STRUCTURED_READ.

I suspect that there are exactly two client implementations, namely
Linux and QEMU's, and both do the right thing.

What servers do doesn't matter, if all the clients agree.

Paolo

[Qemu-block] [PATCH v2 6/6] block: Don't enforce 512 byte minimum alignment

2016-06-14 Thread Kevin Wolf

If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block.c| 2 +-
 block/io.c | 8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index f54bc25..3d850a2 100644
--- a/block.c
+++ b/block.c
@@ -937,7 +937,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }
 
-bs->request_alignment = 512;
+bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
 bs->zero_beyond_eof = true;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
diff --git a/block/io.c b/block/io.c
index b261cc6..b3ff9be 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1052,8 +1052,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 BdrvTrackedRequest req;
 
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -1305,7 +1304,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 uint8_t *buf = NULL;
 QEMUIOVector local_qiov;
 struct iovec iov;
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 unsigned int head_padding_bytes, tail_padding_bytes;
 int ret = 0;
 
@@ -1392,8 +1391,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BdrvTrackedRequest req;
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bs->request_alignment;
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
-- 
1.8.3.1

[Qemu-block] [PATCH v2 0/6] block: Enable byte granularity I/O

2016-06-14 Thread Kevin Wolf

Previous series have already converted some block drivers to byte-based rather
than sector-based interfaces. However, the common I/O path as well as raw-posix
still enforced a minimum alignment of 512 bytes because some sector-based logic
was involved.

This patch series removes these limitations and a sub-sector request actually
ends up as a sub-sector syscall now.

v2:
- Updated trace-events for bdrv_co_do_copy_on_readv() [Eric]
- Added some assertions [Eric]
- Renamed laio_submit_co() -> laio_co_submit() and added coroutine_fn
  to its prototype [Stefan]
- linux-aio: Include block/block.h and use BDRV_SECTOR_SIZE instead
  of 512 [Eric]

Kevin Wolf (6):
  block: Byte-based bdrv_co_do_copy_on_readv()
  block: Prepare bdrv_aligned_preadv() for byte-aligned requests
  block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
  raw-posix: Switch to bdrv_co_* interfaces
  raw-posix: Implement .bdrv_co_preadv/pwritev
  block: Don't enforce 512 byte minimum alignment

 block.c   |   2 +-
 block/io.c| 128 ++
 block/linux-aio.c |  84 -
 block/mirror.c|  10 ++--
 block/raw-aio.h   |   3 ++
 block/raw-posix.c |  62 
 include/block/block.h |  10 ++--
 trace-events  |   2 +-
 8 files changed, 176 insertions(+), 125 deletions(-)

-- 
1.8.3.1

Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray

2016-06-14 Thread Max Reitz

On 10.06.2016 23:59, John Snow wrote:
> If a device still has an attached BDS because the medium has not yet
> been removed, we will be unable to migrate to a new host because
> blk_flush will return an error for that backend.
> 
> Replace the call to blk_is_available to blk_is_inserted to weaken
> the check and allow flushes from the backend to work, while still
> disallowing flushes from the frontend/device model to work.
> 
> This fixes a regression present in 2.6.0 caused by the following commit:
> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> block: Move some bdrv_*_all() functions to BB
> 
> Signed-off-by: John Snow 
> ---
>  block/block-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I
guess you exclude them here because you specifically want to fix the
issue mentioned in the commit message, but then we could just make
blk_flush_all() ignore an -ENOMEDIUM.

I personally think we should make all blk_*flush() functions use
blk_is_inserted() instead of blk_is_available(). As we have discussed on
IRC, there are probably not that many cases a guest can flush a medium
in an open tray anyway (because the main use case are read-only
CD-ROMs), and even if so, that wouldn't change any data, so even if the
guest can actually flush something on an open tray, I don't think anyone
would complain.

Max

> diff --git a/block/block-backend.c b/block/block-backend.c
> index 34500e6..d1e875e 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk)
>  
>  int blk_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }
>  
> 

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 6/6] block: Don't enforce 512 byte minimum alignment

2016-06-14 Thread Kevin Wolf

Am 14.06.2016 um 14:09 hat Stefan Hajnoczi geschrieben:
> On Wed, Jun 08, 2016 at 04:10:11PM +0200, Kevin Wolf wrote:
> > diff --git a/block.c b/block.c
> > index f54bc25..3d850a2 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -937,7 +937,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
> > BdrvChild *file,
> >  goto fail_opts;
> >  }
> >  
> > -bs->request_alignment = 512;
> > +bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
> 
> What happens in the raw-posix.c AIO case?  There we should still use
> 512.

I'm only changing the default value here. raw-posix already overrides
bs->request_alignment as needed, see raw_probe_alignment().

Kevin


pgpmavX5LHSRs.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH v3 3/5] block/null: Implement bdrv_refresh_filename()

2016-06-14 Thread Max Reitz

On 12.06.2016 06:08, Fam Zheng wrote:
> On Fri, 06/10 20:57, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
> 
> The commit message could go a little more informative. Seems nothing special 
> in
> the added callback, aren't things supposed to just work without this patch?
> What is missing?

If you pass a filename, it works without this patch. If you don't, it
doesn't.

Compare (before this patch):

$ x86_64-softmmu/qemu-system-x86_64 \
  -drive if=none,file=null-co://,driver=raw -nodefaults -qmp stdio
[...]
{'execute':'query-block'}
[...] "filename": "null-co://", [...]

With:

$ x86_64-softmmu/qemu-system-x86_64 \
  -drive if=none,driver=raw,file.driver=null-co -nodefaults -qmp stdio
[...]
{'execute':'query-block'}
[...] "filename": "json:{\"driver\": \"raw\", \"file\": {\"driver\":
\"null-co\"}}", [...]


So the issue is that you can use the null-co/aio drivers without giving
a filename.

I wouldn't mind including this information in a v4, but I'm not sure it
alone warrants a v4.

> 
>> ---
>>  block/null.c | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/block/null.c b/block/null.c
>> index 396500b..b511010 100644
>> --- a/block/null.c
>> +++ b/block/null.c
>> @@ -12,6 +12,8 @@
>>  
>>  #include "qemu/osdep.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qmp/qdict.h"
>> +#include "qapi/qmp/qstring.h"
>>  #include "block/block_int.h"
>>  
>>  #define NULL_OPT_LATENCY "latency-ns"
>> @@ -223,6 +225,20 @@ static int64_t coroutine_fn 
>> null_co_get_block_status(BlockDriverState *bs,
>>  }
>>  }
>>  
>> +static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
>> +{
>> +QINCREF(opts);
>> +qdict_del(opts, "filename");
> 
> Why is this qdict_del necessary?

It's not strictly necessary. But the null drivers ignore the filename,
so there's no harm in dropping it from the JSON filename (if we need to
construct one).

Max

>> +
>> +if (!qdict_size(opts)) {
>> +snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s://",
>> + bs->drv->format_name);
>> +}
>> +
>> +qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name));
>> +bs->full_open_options = opts;
>> +}
>> +
>>  static BlockDriver bdrv_null_co = {
>>  .format_name= "null-co",
>>  .protocol_name  = "null-co",
>> @@ -238,6 +254,8 @@ static BlockDriver bdrv_null_co = {
>>  .bdrv_reopen_prepare= null_reopen_prepare,
>>  
>>  .bdrv_co_get_block_status   = null_co_get_block_status,
>> +
>> +.bdrv_refresh_filename  = null_refresh_filename,
>>  };
>>  
>>  static BlockDriver bdrv_null_aio = {
>> @@ -255,6 +273,8 @@ static BlockDriver bdrv_null_aio = {
>>  .bdrv_reopen_prepare= null_reopen_prepare,
>>  
>>  .bdrv_co_get_block_status   = null_co_get_block_status,
>> +
>> +.bdrv_refresh_filename  = null_refresh_filename,
>>  };
>>  
>>  static void bdrv_null_init(void)
>> -- 
>> 2.8.3
>>




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 4/5] iotests: Add test for post-mirror backing chains

2016-06-14 Thread Max Reitz

On 12.06.2016 06:17, Fam Zheng wrote:
> On Fri, 06/10 20:57, Max Reitz wrote:
>> +import os
>> +import stat
>> +import time
> 
> Unused import 'stat' and 'time'?

That happens when you copy old tests and don't check stuff like this...
Well, it won't hurt, but I guess now I may have enough reason to send a
v4 (if I get to it today).

Thanks!

Max

> 
>> +import iotests
>> +from iotests import qemu_img
> 
> 

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev

2016-06-14 Thread Stefan Hajnoczi

On Wed, Jun 08, 2016 at 04:10:10PM +0200, Kevin Wolf wrote:
> The raw-posix block driver actually supports byte-aligned requests now
> on non-O_DIRECT images, like it already (and previously incorrectly)
> claimed in bs->request_alignment.
> 
> For some block drivers this means that a RMW cycle can be avoided when
> they write sub-sector metadata e.g. for cluster allocation.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/linux-aio.c |  6 ++
>  block/raw-aio.h   |  2 +-
>  block/raw-posix.c | 42 ++
>  3 files changed, 25 insertions(+), 25 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 2/6] block: Prepare bdrv_aligned_preadv() for byte-aligned requests

2016-06-14 Thread Stefan Hajnoczi

On Wed, Jun 08, 2016 at 04:10:07PM +0200, Kevin Wolf wrote:
> This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
> requests. Note that this doesn't mean that such requests are actually
> made. The caller still ensures that all requests are aligned to at least
> 512 bytes.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c | 41 +
>  1 file changed, 17 insertions(+), 24 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 3/6] block: Prepare bdrv_aligned_pwritev() for byte-aligned requests

2016-06-14 Thread Stefan Hajnoczi

On Wed, Jun 08, 2016 at 04:10:08PM +0200, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 0/6] block: Enable byte granularity I/O

2016-06-14 Thread Stefan Hajnoczi

On Mon, Jun 13, 2016 at 03:43:06PM +0200, Kevin Wolf wrote:
> Am 13.06.2016 um 15:27 hat Stefan Hajnoczi geschrieben:
> > On Wed, Jun 08, 2016 at 04:10:05PM +0200, Kevin Wolf wrote:
> > > Previous series have already converted some block drivers to byte-based 
> > > rather
> > > than sector-based interfaces. However, the common I/O path as well as 
> > > raw-posix
> > > still enforced a minimum alignment of 512 bytes because some sector-based 
> > > logic
> > > was involved.
> > > 
> > > This patch series removes these limitations and a sub-sector request 
> > > actually
> > > ends up as a sub-sector syscall now.
> > > 
> > > Kevin Wolf (6):
> > >   block: Byte-based bdrv_co_do_copy_on_readv()
> > >   block: Prepare bdrv_aligned_preadv() for byte-aligned requests
> > >   block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
> > >   raw-posix: Switch to bdrv_co_* interfaces
> > >   raw-posix: Implement .bdrv_co_preadv/pwritev
> > >   block: Don't enforce 512 byte minimum alignment
> > > 
> > >  block.c   |   2 +-
> > >  block/io.c| 125 
> > > +-
> > >  block/linux-aio.c |  83 -
> > >  block/mirror.c|  10 ++--
> > >  block/raw-aio.h   |   2 +
> > >  block/raw-posix.c |  61 
> > >  include/block/block.h |  10 ++--
> > >  7 files changed, 169 insertions(+), 124 deletions(-)
> > 
> > I've taken an initial look and it looks good.  Will review next revision
> > in depth so it can be merged after Eric's comments have been addressed.
> 
> Eric commented a lot, but only requested very few minor changes that
> wouldn't strictly require resending the series. If you think it's
> worthwhile to send a v2 for them, I can do that, but it shouldn't make a
> big difference for your review.

Done.  Looks very close.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 1/6] block: Byte-based bdrv_co_do_copy_on_readv()

2016-06-14 Thread Stefan Hajnoczi

On Wed, Jun 08, 2016 at 04:10:06PM +0200, Kevin Wolf wrote:
> In a first step to convert the common I/O path to work on bytes rather
> than sectors, this converts the copy-on-read logic that is used by
> bdrv_aligned_preadv().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c| 63 
> +++
>  block/mirror.c| 10 
>  include/block/block.h | 10 +---
>  3 files changed, 51 insertions(+), 32 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 137 matches

Mail list logo