Re: [Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > Block commit of the active image to the backing store on a slow disk > could never end. For example with the guest with the following loop > inside > while true; do > dd bs=1k count=1 if=/dev/zero of=x > done > running above slow storage could not complete the operation with a s/with/within/ > resonable amount of time: s/resonable/reasonable/ > virsh blockcommit rhel7 sda --active --shallow > virsh qemu-monitor-event > virsh qemu-monitor-command rhel7 \ > '{"execute":"block-job-complete",\ > "arguments":{"device":"drive-scsi0-0-0-0"} }' > virsh qemu-monitor-event > Completion event is never received. > > This problem could not be fixed easily with the current architecture. We > should either prohibit guest writes (making dirty bitmap dirty) or switch > to the sycnchronous scheme. s/sycnchronous/synchronous/ > > This patch implements the latter. It adds mirror_before_write_notify > callback. In this case all data written from the guest is synchnonously s/synchnonously/synchronously/ > written to the mirror target. Though the problem is solved partially. > We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be > done in the next patch. > In other words, the mere act of mirroring a guest will now be guest-visible in that the guest is auto-throttled while waiting for the mirroring to be written out. It seems like you would want to be able to opt in or out of this scheme. Is it something that can be toggled mid-operation (try asynchronous, and switch to synchronous if a timeout elapses)? > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 78 > ++ > 1 file changed, 78 insertions(+) > I'll leave the actual idea to others to review, because there may be some ramifications that I'm not thinking of. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: In the subject: 'allow to save buffer' is not idiomatic English; better would be 'allow saving the buffer' or simply 'save the buffer' > Properly cook MirrorOp initialization/deinitialization. The field is not > yet used actually. Another "what" but not "why" - expanding the commit message to mention "why" makes it easier to review. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 32 +++- > 1 file changed, 19 insertions(+), 13 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index d8be80a..7471211 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -73,6 +73,7 @@ typedef struct MirrorOp { > QEMUIOVector qiov; > int64_t sector_num; > int nb_sectors; > +void *buf; > } MirrorOp; > > static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read, > @@ -100,24 +101,28 @@ static void mirror_iteration_done(MirrorOp *op, int ret) > s->in_flight--; > s->sectors_in_flight -= op->nb_sectors; > iov = op->qiov.iov; > -for (i = 0; i < op->qiov.niov; i++) { > -MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base; > -QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next); > -s->buf_free_count++; > -} > > -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > -chunk_num = op->sector_num / sectors_per_chunk; > -nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk); > -bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks); > -if (ret >= 0) { > -if (s->cow_bitmap) { > -bitmap_set(s->cow_bitmap, chunk_num, nb_chunks); > +if (op->buf == NULL) { > +for (i = 0; i < op->qiov.niov; i++) { > +MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base; > +QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next); > +s->buf_free_count++; > +} > + > +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > +chunk_num = op->sector_num / sectors_per_chunk; > +nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk); > +bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks); I still think it might be smarter to fix bitmap operations to work on byte inputs (still sectors, or rather granularity chunks, under the hood, but no need to make users scale things just to have it scaled again). > +if (ret >= 0) { > +if (s->cow_bitmap) { > +bitmap_set(s->cow_bitmap, chunk_num, nb_chunks); > +} > +s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE; > } > -s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE; > } > > qemu_iovec_destroy(>qiov); > +g_free(op->buf); > g_free(op); > > if (s->waiting_for_io) { > @@ -255,6 +260,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t > sector_num, > op->s = s; > op->sector_num = sector_num; > op->nb_sectors = nb_sectors; > +op->buf = NULL; > > /* Now make a QEMUIOVector taking enough granularity-sized chunks > * from s->buf_free. > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > Signed-off-by: Denis V. LunevThe commit message says what, but not why. It's useful to give reviewers a hint as to why a patch makes sense (such as a future patch being able to use the write notifier to make mirroring more efficient if it knows what is being mirrored). > Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/io.c| 12 +++- > include/block/block_int.h | 1 + > 2 files changed, 8 insertions(+), 5 deletions(-) > > @@ -2228,7 +2230,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, > int64_t sector_num, > return 0; > } > > -tracked_request_begin(, bs, sector_num, nb_sectors, > +tracked_request_begin(, bs, NULL, sector_num, nb_sectors, >BDRV_TRACKED_DISCARD); > bdrv_set_dirty(bs, sector_num, nb_sectors); > > @@ -2331,7 +2333,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int > req, void *buf) > }; > BlockAIOCB *acb; > > -tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL); > +tracked_request_begin(_req, bs, NULL, 0, 0, BDRV_TRACKED_IOCTL); > if (!drv || !drv->bdrv_aio_ioctl) { > co.ret = -ENOTSUP; > goto out; > diff --git a/include/block/block_int.h b/include/block/block_int.h > index 30a9717..72f463a 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -69,6 +69,7 @@ enum BdrvTrackedRequestType { > > typedef struct BdrvTrackedRequest { > BlockDriverState *bs; > +QEMUIOVector *qiov; > int64_t offset; > unsigned int bytes; I guess bytes and qiov->size are not always redundant, because you pass NULL for qiov for a zero or discard operation (an alternative would be to always pass a qiov, even for zero/discard, so that we only need a single size). But I've been pointing out our inconsistent use of qiov for zeroes in multiple places, so I don't think it's worth changing in this series, but in one of its own if we want to do it. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 5/9] mirror: improve performance of mirroring of empty disk
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > We should not take into account zero blocks for delay calculations. > They are not read and thus IO throttling is not required. In the > other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes > days. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) Seems reasonable, but I'll let others more familiar with throttling give the final say. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 4/9] mirror: efficiently zero out target
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed > into the wire. Thus the target could be very efficiently zeroed out. This > is should be done with the largest chunk possible. > > This improves the performance of the live migration of the empty disk by > 150 times if NBD supports write_zeroes. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 32 +++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/block/mirror.c b/block/mirror.c > index c7b3639..c2f8773 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -21,6 +21,7 @@ > #include "qemu/ratelimit.h" > #include "qemu/bitmap.h" > > +#define MIRROR_ZERO_CHUNK (3u << (29 - BDRV_SECTOR_BITS)) /* 1.5 Gb */ Probably nicer to track this in bytes. And do you really want a hard-coded arbitrary limit, or is it better to live with MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)? > @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s) > > end = s->bdev_length / BDRV_SECTOR_SIZE; > > -if (base == NULL && !bdrv_has_zero_init(target_bs)) { > +if (base == NULL && !bdrv_has_zero_init(target_bs) && > +target_bs->drv->bdrv_co_write_zeroes == NULL) { Indentation is off, although if checkpatch.pl doesn't complain I guess it doesn't matter that much. Why should you care whether the target_bs->drv implements a callback? Can't you just rely on the normal bdrv_*() functions to do the dirty work of picking the most efficient implementation without you having to bypass the block layer? In fact, isn't that the whole goal of bdrv_make_zero() - why not call that instead of reimplementing it? Patch needs rebasing - we've redone this into bdrv_co_pwrite_zeroes and a byte interface, since upstream commit c1499a5e. > bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end); > return 0; > } > @@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s) > } > sector_num += n; > } > + > +if (base != NULL || bdrv_has_zero_init(target_bs)) { You're now repeating the conditional that used to be 'bool mark_all_dirty' (well, this is !mark_all_dirty); is it worth keeping the simpler bool around? > +/* no need to zero out entire disk */ > +return 0; > +} > + > +for (sector_num = 0; sector_num < end; ) { > +int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num); Why limit yourself to 1.5G? It's either too small for what you can really do, or too large for what the device permits. See my above comment about MIN_NON_ZERO. > +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > + > +if (now - last_pause_ns > SLICE_TIME) { > +last_pause_ns = now; > +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0); > +} > + > +if (block_job_is_cancelled(>common)) { > +return -EINTR; > +} > + > +if (s->in_flight == MAX_IN_FLIGHT) { > +trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1); > +mirror_wait_for_io(s); > +continue; > +} Hmm - I guess your mirror yield points are why you couldn't just directly use bdrv_make_zero(); but is that something where some code refactoring can share more code rather than duplicating it? > + > +mirror_do_zero_or_discard(s, sector_num, nb_sectors, false); > +sector_num += nb_sectors; > +} > return 0; > } > > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > There is no need to scan allocation tables if we have mark_all_dirty flag > set. Just mark it all dirty. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 797659d..c7b3639 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s) > BlockDriverState *base = s->base; > BlockDriverState *bs = blk_bs(s->common.blk); > BlockDriverState *target_bs = blk_bs(s->target); > -bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs); > uint64_t last_pause_ns; > int ret, n; > > end = s->bdev_length / BDRV_SECTOR_SIZE; > > +if (base == NULL && !bdrv_has_zero_init(target_bs)) { > +bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end); Won't work as written. 'end' is 64 bits, but bdrv_set_dirty_bitmap() is limited to a 32-bit sector count. Might be first worth updating bdrv_set_dirty_bitmap() and friends to be byte-based rather than sector-based (but still tracking a sector per bit, obviously), as well as expand it to operate on 64-bit sizes rather than 32-bit. I'm also worried slightly that the existing code repeated things in a loop, and therefore had pause points every iteration and could thus remain responsive to an early cancel. But doing the entire operation in one chunk (assuming you fix bitmap code to handle a 64-bit size) may end up running for so long without interruption that you lose the benefits of an early interruption that you have by virtue of a 32-bit limit. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > The code inside the helper will be extended in the next patch. mirror_run > itself is overbloated at the moment. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 83 > ++ > 1 file changed, 49 insertions(+), 34 deletions(-) > Looks like a nice split. Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size
On Tue, 06/14 16:49, Max Reitz wrote: > On 18.05.2016 07:53, Fam Zheng wrote: > > Otherwise the job is orphaned and block_job_cancel_sync in > > bdrv_close_all() when quiting will hang. > > > > A simple reproducer is running blockdev-backup from null-co:// to > > null-co://. > > > > Cc: qemu-sta...@nongnu.org > > Signed-off-by: Fam Zheng> > --- > > block/backup.c | 34 ++ > > 1 file changed, 18 insertions(+), 16 deletions(-) > > Sorry for having waited so long (I was thinking that maybe Jeff wanted > to take this patch), but now the patch no longer applies and I don't > feel comfortable with just fixing it up myself; git-backport-diff tells > me the required changes are "[0015] [FC]". Oh it turns out this patch is superseded by commit 91ab68837933232bcef99da7c968e6d41900419b Author: Kevin Wolf Date: Thu Apr 14 12:59:55 2016 +0200 backup: Don't leak BackupBlockJob in error path Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz Reviewed-by: Alberto Garcia So let's just drop it. Fam
Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size
On Tue, 06/14 16:49, Max Reitz wrote: > On 18.05.2016 07:53, Fam Zheng wrote: > > Otherwise the job is orphaned and block_job_cancel_sync in > > bdrv_close_all() when quiting will hang. > > > > A simple reproducer is running blockdev-backup from null-co:// to > > null-co://. > > > > Cc: qemu-sta...@nongnu.org > > Signed-off-by: Fam Zheng> > --- > > block/backup.c | 34 ++ > > 1 file changed, 18 insertions(+), 16 deletions(-) > > Sorry for having waited so long (I was thinking that maybe Jeff wanted > to take this patch), but now the patch no longer applies and I don't > feel comfortable with just fixing it up myself; git-backport-diff tells > me the required changes are "[0015] [FC]". No problem! I'll rebase it and submit again. Fam
Re: [Qemu-block] [PATCH v3 08/14] block/nbd: Accept SocketAddress
On 04/06/2016 12:28 PM, Max Reitz wrote: > Add a new option "address" to the NBD block driver which accepts a > SocketAddress. > > "path", "host" and "port" are still supported as legacy options and are > mapped to their corresponding SocketAddress representation. The back-compat work is pretty slick. > > Signed-off-by: Max Reitz> --- > block/nbd.c | 97 > ++- > tests/qemu-iotests/051.out| 4 +- > tests/qemu-iotests/051.pc.out | 4 +- > 3 files changed, 64 insertions(+), 41 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index 3adf302..9ae66ba 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -32,6 +32,8 @@ > #include "qemu/uri.h" > #include "block/block_int.h" > #include "qemu/module.h" > +#include "qapi-visit.h" > +#include "qapi/qmp-input-visitor.h" > #include "qapi/qmp/qdict.h" > #include "qapi/qmp/qjson.h" > #include "qapi/qmp/qint.h" > @@ -128,7 +130,9 @@ static bool nbd_has_filename_options_conflict(QDict > *options, Error **errp) > if (!strcmp(e->key, "host") > || !strcmp(e->key, "port") > || !strcmp(e->key, "path") > -|| !strcmp(e->key, "export")) > +|| !strcmp(e->key, "export") > +|| !strcmp(e->key, "address") > +|| !strncmp(e->key, "address.", 8)) May need a tweak if you break before '||' > { > error_setg(errp, "Option '%s' cannot be used with a file name", > e->key); > @@ -202,46 +206,66 @@ out: > g_free(file); > } > > +static bool nbd_process_legacy_socket_options(QDict *options, Error **errp) > +{ > +if (qdict_haskey(options, "path") && qdict_haskey(options, "host")) { > +error_setg(errp, "path and host may not be used at the same time"); > +return false; > +} else if (qdict_haskey(options, "path")) { > +if (qdict_haskey(options, "port")) { > +error_setg(errp, "port may not be used without host"); > +return false; > +} > + > +qdict_put(options, "address.type", qstring_from_str("unix")); > +qdict_change_key(options, "path", "address.data.path"); > +} else if (qdict_haskey(options, "host")) { > +qdict_put(options, "address.type", qstring_from_str("inet")); > +qdict_change_key(options, "host", "address.data.host"); > +if (!qdict_change_key(options, "port", "address.data.port")) { > +qdict_put(options, "address.data.port", > + qstring_from_str(stringify(NBD_DEFAULT_PORT))); > +} The rewrite from old to modern is rather nice. I wonder if we could then use a generated qapi_visit_SocketAddress instead of manually handling all the complication of SocketAddress proper. > +} > + > +return true; > +} > + > static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char > **export, > Error **errp) > { > -SocketAddress *saddr; > +SocketAddress *saddr = NULL; > +QDict *addr = NULL; > +QObject *crumpled_addr; > +QmpInputVisitor *iv; > +Error *local_err = NULL; > > -if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) { > -if (qdict_haskey(options, "path")) { > -error_setg(errp, "path and host may not be used at the same > time"); > -} else { > -error_setg(errp, "one of path and host must be specified"); > -} > -return NULL; > +if (!nbd_process_legacy_socket_options(options, errp)) { > +goto fail; > } > -if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) { > -error_setg(errp, "port may not be used without host"); > -return NULL; > + > +qdict_extract_subqdict(options, , "address."); > +if (!qdict_size(addr)) { > +error_setg(errp, "NBD server address missing"); > +goto fail; > } > > -saddr = g_new0(SocketAddress, 1); > +crumpled_addr = qdict_crumple(addr, true, errp); > +if (!crumpled_addr) { > +goto fail; > +} > Ah, so you ARE depending on Dan's qdict_crumple(). > -if (qdict_haskey(options, "path")) { > -UnixSocketAddress *q_unix; > -saddr->type = SOCKET_ADDRESS_KIND_UNIX; > -q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1); > -q_unix->path = g_strdup(qdict_get_str(options, "path")); > -qdict_del(options, "path"); > -} else { > -InetSocketAddress *inet; > -saddr->type = SOCKET_ADDRESS_KIND_INET; > -inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1); > -inet->host = g_strdup(qdict_get_str(options, "host")); > -if (!qdict_get_try_str(options, "port")) { > -inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT); > -} else { > -inet->port = g_strdup(qdict_get_str(options, "port")); > -} > -
Re: [Qemu-block] [PATCH v3 07/14] block/nbd: "address" in nbd_refresh_filename()
On 04/06/2016 12:28 PM, Max Reitz wrote: > As of a future patch, the NBD block driver will accept a SocketAddress > structure for a new "address" option. In order to support this, > nbd_refresh_filename() needs some changes. > > The two TODOs introduced by this patch will be removed in the very next > one. They exist to explain that it is currently impossible for > nbd_refresh_filename() to emit an "address.*" option (which the NBD > block driver does not handle yet). The next patch will arm these code > paths, but it will also enable handling of these options. > > Signed-off-by: Max Reitz> --- > block/nbd.c | 84 > - > 1 file changed, 61 insertions(+), 23 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index 1736f68..3adf302 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -441,37 +441,75 @@ static void nbd_attach_aio_context(BlockDriverState *bs, > static void nbd_refresh_filename(BlockDriverState *bs, QDict *options) > { > QDict *opts = qdict_new(); > -const char *path = qdict_get_try_str(options, "path"); > -const char *host = qdict_get_try_str(options, "host"); > -const char *port = qdict_get_try_str(options, "port"); > +bool can_generate_filename = true; > +const char *path = NULL, *host = NULL, *port = NULL; > const char *export = qdict_get_try_str(options, "export"); > const char *tlscreds = qdict_get_try_str(options, "tls-creds"); > > -if (host && !port) { > -port = stringify(NBD_DEFAULT_PORT); > +if (qdict_get_try_str(options, "address.type")) { > +/* This path will only be possible as of a future patch; > + * TODO: Remove this note once it is */ > + > +const char *type = qdict_get_str(options, "address.type"); > + Oh, I'm so tempted to teach the QAPI generator how to make a discriminated union have a default 'type' value (thus making the discriminator optional), so that we don't need a layer of nesting behind 'address.'. > +if (!strcmp(type, "unix")) { > +path = qdict_get_str(options, "address.data.path"); > +} else if (!strcmp(type, "inet")) { > +host = qdict_get_str(options, "address.data.host"); > +port = qdict_get_str(options, "address.data.port"); It's especially annoying that because SocketAddress is not flat, we have to expose the 'data.' layer of nesting, even if we could avoid the 'address.' layer. > + > +can_generate_filename = !qdict_haskey(options, "address.data.to") > + && !qdict_haskey(options, > "address.data.ipv4") > + && !qdict_haskey(options, > "address.data.ipv6"); > +} else { > +can_generate_filename = false; > +} > +} else { > +path = qdict_get_try_str(options, "path"); > +host = qdict_get_try_str(options, "host"); > +port = qdict_get_try_str(options, "port"); > + > +if (host && !port) { > +port = stringify(NBD_DEFAULT_PORT); > +} > } Looks clean given the constraints of what you are able to use from QAPI. > + > +if (qdict_get_try_str(options, "address.type")) { > +/* This path will only be possible as of a future patch; > + * TODO: Remove this note once it is */ > + > +const QDictEntry *e; > +for (e = qdict_first(options); e; e = qdict_next(options, e)) { > +if (!strncmp(e->key, "address.", 8)) { > +qobject_incref(e->value); > +qdict_put_obj(opts, e->key, e->value); > +} > +} This part makes me wonder if we want Dan's qdict_crumple() working first. > } else { > -qdict_put(opts, "host", qstring_from_str(host)); > -qdict_put(opts, "port", qstring_from_str(port)); > +if (path) { > +qdict_put(opts, "path", qstring_from_str(path)); > +} else { > +qdict_put(opts, "host", qstring_from_str(host)); > +qdict_put(opts, "port", qstring_from_str(port)); > +} > } > if (export) { > qdict_put(opts, "export", qstring_from_str(export)); > At this point, I'll reserve giving R-b until I've seen the whole series (it may need rebasing anyways...) -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 06/14] block/nbd: Add nbd_has_filename_options_conflict()
On 04/06/2016 12:28 PM, Max Reitz wrote: > Right now, we have four possible options that conflict with specifying > an NBD filename, and a future patch will add another one ("address"). > This future option is a nested QDict that is flattened at this point, > requiring as to test each option whether its key has an "address." > prefix. Therefore, we will then need to iterate through all options. > > Adding this iteration logic now will simplify adding the new option > later. A nice side effect is that the user will not receive a long list > of five options which are not supposed to be specified with a filename, > but we can actually print the problematic option. > > Signed-off-by: Max Reitz> --- > block/nbd.c | 26 -- > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index d12bcc6..1736f68 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -120,6 +120,25 @@ out: > return ret; > } > > +static bool nbd_has_filename_options_conflict(QDict *options, Error **errp) > +{ > +const QDictEntry *e; > + > +for (e = qdict_first(options); e; e = qdict_next(options, e)) { > +if (!strcmp(e->key, "host") > +|| !strcmp(e->key, "port") I know there are already instances of breaking before || in this file, but most of qemu breaks after, as in: if (!strcmp(e->key, "host") || !strcmp(e->key, "port") || ... But choice of formatting is trivial, so: Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv
On 06/14/2016 09:25 AM, Denis V. Lunev wrote: > 4th argument is flags rather than size. Fortunately flags occupies > 5 less significant bits and they are always zero due to alignment. > > Signed-off-by: Denis V. Lunev> Reviewed-by: Vladimir Sementsov-Ogievskiy > CC: Stefan Hajnoczi > CC: Fam Zheng > CC: Kevin Wolf > CC: Max Reitz > CC: Jeff Cody > CC: Eric Blake > --- > block/mirror.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) Duplicate of this patch, already on Kevin's block tree: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03377.html -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 for-2.7 00/14] qapi: Allow blockdev-add for NBD
On 05/03/2016 09:23 AM, Kevin Wolf wrote: > Am 06.04.2016 um 20:28 hat Max Reitz geschrieben: >> Turns out NBD is not so simple to do if you do it right. Anyway, this >> series adds blockdev-add support for NBD clients. > > What the series does seems to make sense to me, though things would be a > bit nicer (especially on the command line) if SocketAddress had been a > flat union from the beginning. It's too late to change now, and I guess > the ugliness isn't worth duplicating it. I'm sorely tempted to make SocketAddress a flat union anyways (or even an alternate that can be flat or nested at will), if only for convenience. But that's a pipe dream - there's probably not enough time before 2.7 to actually achieve it. > > I'll leave the in-detail review to our NBD folks. > > Kevin > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 05/14] block/nbd: Use qdict_put()
On 04/06/2016 12:28 PM, Max Reitz wrote: > Instead of inlining this nice macro (i.e. resorting to > qdict_put_obj(..., QOBJECT(...))), use it. > > Signed-off-by: Max Reitz> --- > block/nbd.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 04/14] block/nbd: Default port in nbd_refresh_filename()
On 04/06/2016 12:28 PM, Max Reitz wrote: > Instead of not emitting the port in nbd_refresh_filename(), just set it > to the default if the user did not specify it. This makes the logic a > bit simpler. > > Signed-off-by: Max Reitz> --- > block/nbd.c | 18 +++--- > 1 file changed, 7 insertions(+), 11 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index 2112ec0..efa5d3d 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -433,6 +433,10 @@ static void nbd_refresh_filename(BlockDriverState *bs, > QDict *options) > const char *export = qdict_get_try_str(options, "export"); > const char *tlscreds = qdict_get_try_str(options, "tls-creds"); > > +if (host && !port) { > +port = stringify(NBD_DEFAULT_PORT); > +} It would be nice to store the port as a number rather than a string - but that's not your task (I've long thought that port should be a QAPI alternate type, with both string and number branches, but it's a big audit and code change to actually make it happen). Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
On 14/06/2016 17:59, Alex Bligh wrote: > >> On 14 Jun 2016, at 16:11, Paolo Bonziniwrote: >> >>> To illustrate the problem, look consider what qemu itself would do as >>> a server if it can't buffer the entire read issued to it. >> >> Return ENOMEM? > > Well OK, qemu then 'works' on the basis it breaks another > part of the spec, which is coping with long reads. ENOMEM is a documented error code, and the limits extension will help with that as well. >> However, it looks like the >> de facto status prior to structured replies is that the error is in the >> spec, and this patch introduces a regression. > > Well, I guess the patch makes it work the same as the > reference server implementation and the spec, which I'd > consider a fix. My view is that the error is in the > kernel client. ... and QEMU and BSD. What good is a server that doesn't interoperate (albeit only in error cases) with any client? Paolo
Re: [Qemu-block] [PATCH v3 01/14] qdict: Add qdict_change_key()
On 04/06/2016 12:28 PM, Max Reitz wrote: > This is a shorthand function for changing a QDict's entry's key. > > Signed-off-by: Max Reitz> --- > include/qapi/qmp/qdict.h | 1 + > qobject/qdict.c | 23 +++ > 2 files changed, 24 insertions(+) Reviewed-by: Eric Blake However, it would be nice to enhance tests/check-qdict.c to cover this. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 02/14] block/nbd: Drop trailing "." in error messages
On 04/06/2016 12:28 PM, Max Reitz wrote: > Signed-off-by: Max Reitz> --- > block/nbd.c | 4 ++-- > tests/qemu-iotests/051.out| 4 ++-- > tests/qemu-iotests/051.pc.out | 4 ++-- > 3 files changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Eric Blake Could go in via qemu-trivial, if desired. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/7] doc: move text describing --trace to specific .texi file
On 06/14/2016 03:49 PM, Eric Blake wrote: > On 06/14/2016 04:08 AM, Denis V. Lunev wrote: >> This text will be included to qemu-nbd/qemu-img mans in the next patches. >> >> Signed-off-by: Denis V. Lunev>> CC: Eric Blake >> CC: Paolo Bonzini >> CC: Stefan Hajnoczi >> CC: Kevin Wolf >> --- >> Makefile | 3 ++- >> qemu-option-trace.texi | 25 + >> qemu-options.hx| 27 +-- >> 3 files changed, 28 insertions(+), 27 deletions(-) >> create mode 100644 qemu-option-trace.texi > >> +++ b/qemu-options.hx >> @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options >> are backend-specific but >> HXCOMM HX does not support conditional compilation of text. >> @item -trace [events=@var{file}][,file=@var{file}] >> @findex -trace > > As long as you are touching here, change the -trace line to match --help > output: > > -trace [[enable=]][,events=][,file=] Nevermind - I see you do that in 2/7. > > With that change: > > Reviewed-by: Eric Blake > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 5/7] trace: enable tracing in qemu-nbd
On 06/14/2016 04:08 AM, Denis V. Lunev wrote: > Please note, trace_init_backends() must be called in the final process, > i.e. after daemonization. This is necessary to keep tracing thread in > the proper process. > > Signed-off-by: Denis V. Lunev> CC: Eric Blake > CC: Paolo Bonzini > CC: Stefan Hajnoczi > CC: Kevin Wolf > --- > Makefile | 2 +- > qemu-nbd.c| 18 +- > qemu-nbd.texi | 3 +++ > 3 files changed, 21 insertions(+), 2 deletions(-) Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 7/7] trace: enable tracing in qemu-img
On 06/14/2016 04:08 AM, Denis V. Lunev wrote: > The command will work this way: > qemu-img --trace qcow2* create -f qcow2 1.img 64G > > Signed-off-by: Denis V. Lunev> Suggested by: Daniel P. Berrange > CC: Eric Blake > CC: Paolo Bonzini > CC: Stefan Hajnoczi > CC: Kevin Wolf > --- > Makefile | 2 +- > qemu-img.c| 18 +- > qemu-img.texi | 3 +++ > 3 files changed, 21 insertions(+), 2 deletions(-) > Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/7] doc: move text describing --trace to specific .texi file
On 06/14/2016 04:08 AM, Denis V. Lunev wrote: > This text will be included to qemu-nbd/qemu-img mans in the next patches. > > Signed-off-by: Denis V. Lunev> CC: Eric Blake > CC: Paolo Bonzini > CC: Stefan Hajnoczi > CC: Kevin Wolf > --- > Makefile | 3 ++- > qemu-option-trace.texi | 25 + > qemu-options.hx| 27 +-- > 3 files changed, 28 insertions(+), 27 deletions(-) > create mode 100644 qemu-option-trace.texi > +++ b/qemu-options.hx > @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options > are backend-specific but > HXCOMM HX does not support conditional compilation of text. > @item -trace [events=@var{file}][,file=@var{file}] > @findex -trace As long as you are touching here, change the -trace line to match --help output: -trace [[enable=]][,events=][,file=] With that change: Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v2 17/17] block: Move request_alignment into BlockLimit
It makes more sense to have ALL block size limit constraints in the same struct. Improve the documentation while at it. Signed-off-by: Eric Blake--- v2: drop hacks for save/restore of alignment, now that earlier patches handled setting up BlockLimits more sanely --- include/block/block_int.h | 22 +- block.c | 2 +- block/blkdebug.c | 4 ++-- block/bochs.c | 2 +- block/cloop.c | 2 +- block/dmg.c | 2 +- block/io.c| 12 ++-- block/iscsi.c | 2 +- block/qcow2.c | 2 +- block/raw-posix.c | 16 block/raw-win32.c | 6 +++--- block/vvfat.c | 2 +- 12 files changed, 39 insertions(+), 35 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index 88ef826..77f47d9 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -324,6 +324,12 @@ struct BlockDriver { }; struct BlockLimits { +/* Alignment requirement, in bytes, for offset/length of I/O + * requests. Must be a power of 2 less than INT_MAX. A value of 0 + * defaults to 1 for drivers with modern byte interfaces, and to + * 512 otherwise. */ +uint32_t request_alignment; + /* maximum number of bytes that can be discarded at once (since it * is signed, it must be < 2G, if set), should be multiple of * pdiscard_alignment, but need not be power of 2. May be 0 if no @@ -332,8 +338,8 @@ struct BlockLimits { /* optimal alignment for discard requests in bytes, must be power * of 2, less than max_discard if that is set, and multiple of - * bs->request_alignment. May be 0 if bs->request_alignment is - * good enough */ + * bl.request_alignment. May be 0 if bl.request_alignment is good + * enough */ uint32_t pdiscard_alignment; /* maximum number of bytes that can zeroized at once (since it is @@ -343,12 +349,12 @@ struct BlockLimits { /* optimal alignment for write zeroes requests in bytes, must be * power of 2, less than max_pwrite_zeroes if that is set, and - * multiple of bs->request_alignment. May be 0 if - * bs->request_alignment is good enough */ + * multiple of bl.request_alignment. May be 0 if + * bl.request_alignment is good enough */ uint32_t pwrite_zeroes_alignment; /* optimal transfer length in bytes (must be power of 2, and - * multiple of bs->request_alignment), or 0 if no preferred size */ + * multiple of bl.request_alignment), or 0 if no preferred size */ uint32_t opt_transfer; /* maximal transfer length in bytes (need not be power of 2, but @@ -356,10 +362,10 @@ struct BlockLimits { * For now, anything larger than INT_MAX is clamped down. */ uint32_t max_transfer; -/* memory alignment so that no bounce buffer is needed */ +/* memory alignment, in bytes so that no bounce buffer is needed */ size_t min_mem_alignment; -/* memory alignment for bounce buffer */ +/* memory alignment, in bytes, for bounce buffer */ size_t opt_mem_alignment; /* maximum number of iovec elements */ @@ -463,8 +469,6 @@ struct BlockDriverState { /* I/O Limits */ BlockLimits bl; -/* Alignment requirement for offset/length of I/O requests */ -unsigned int request_alignment; /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */ unsigned int supported_write_flags; /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA, diff --git a/block.c b/block.c index 61fe1df..d578df8 100644 --- a/block.c +++ b/block.c @@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, assert(bdrv_opt_mem_align(bs) != 0); assert(bdrv_min_mem_align(bs) != 0); -assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs)); +assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs)); qemu_opts_del(opts); return 0; diff --git a/block/blkdebug.c b/block/blkdebug.c index 1589fa7..5e32643 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -384,7 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags, /* Set request alignment */ align = qemu_opt_get_size(opts, "align", - bs->request_alignment ?: BDRV_SECTOR_SIZE); + bs->bl.request_alignment ?: BDRV_SECTOR_SIZE); if (align > 0 && align < INT_MAX && is_power_of_2(align)) { s->align = align; } else { @@ -727,7 +727,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp) BDRVBlkdebugState *s = bs->opaque; if (s->align) { -bs->request_alignment = s->align; +bs->bl.request_alignment = s->align; } } diff --git a/block/bochs.c b/block/bochs.c index 182c50b..4194f1d 100644 --- a/block/bochs.c +++ b/block/bochs.c @@ -190,7 +190,7 @@ fail: static void
[Qemu-block] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_transfer_length and opt_transfer_length. Rename them (dropping the _length suffix) so that the compiler will help us catch the change in semantics across any rebased code, and improve the documentation. Use unsigned values, so that we don't have to worry about negative values and so that bit-twiddling is easier; however, we are still constrained by 2^31 of signed int in most APIs. Signed-off-by: Eric Blake--- v2: Hoist unrelated cleanups earlier, use name 'max_transfer' in more places, tweak iscsi calculations --- include/block/block_int.h | 11 +++ include/sysemu/block-backend.h | 2 +- block/block-backend.c | 10 +- block/io.c | 23 +++ block/iscsi.c | 20 block/nbd.c| 2 +- block/raw-posix.c | 3 ++- hw/block/virtio-blk.c | 9 + hw/scsi/scsi-generic.c | 11 ++- qemu-img.c | 8 10 files changed, 54 insertions(+), 45 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index 16c43e2..2b45d57 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -338,11 +338,14 @@ typedef struct BlockLimits { * power of 2, and less than max_pwrite_zeroes if that is set */ uint32_t pwrite_zeroes_alignment; -/* optimal transfer length in sectors */ -int opt_transfer_length; +/* optimal transfer length in bytes (must be power of 2, and + * multiple of bs->request_alignment), or 0 if no preferred size */ +uint32_t opt_transfer; -/* maximal transfer length in sectors */ -int max_transfer_length; +/* maximal transfer length in bytes (need not be power of 2, but + * should be multiple of opt_transfer), or 0 for no 32-bit limit. + * For now, anything larger than INT_MAX is clamped down. */ +uint32_t max_transfer; /* memory alignment so that no bounce buffer is needed */ size_t min_mem_alignment; diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index c04af8e..2469a1c 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -170,7 +170,7 @@ bool blk_is_available(BlockBackend *blk); void blk_lock_medium(BlockBackend *blk, bool locked); void blk_eject(BlockBackend *blk, bool eject_flag); int blk_get_flags(BlockBackend *blk); -int blk_get_max_transfer_length(BlockBackend *blk); +uint32_t blk_get_max_transfer(BlockBackend *blk); int blk_get_max_iov(BlockBackend *blk); void blk_set_guest_block_size(BlockBackend *blk, int align); void *blk_try_blockalign(BlockBackend *blk, size_t size); diff --git a/block/block-backend.c b/block/block-backend.c index 1fb070b..e042544 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1303,16 +1303,16 @@ int blk_get_flags(BlockBackend *blk) } } -/* Returns the maximum transfer length, in sectors; guaranteed nonzero */ -int blk_get_max_transfer_length(BlockBackend *blk) +/* Returns the maximum transfer length, in bytes; guaranteed nonzero */ +uint32_t blk_get_max_transfer(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); -int max = 0; +uint32_t max = 0; if (bs) { -max = bs->bl.max_transfer_length; +max = bs->bl.max_transfer; } -return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS); +return MIN_NON_ZERO(max, INT_MAX); } int blk_get_max_iov(BlockBackend *blk) diff --git a/block/io.c b/block/io.c index c425ce8..5e38ab4 100644 --- a/block/io.c +++ b/block/io.c @@ -88,8 +88,8 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) error_propagate(errp, local_err); return; } -bs->bl.opt_transfer_length = bs->file->bs->bl.opt_transfer_length; -bs->bl.max_transfer_length = bs->file->bs->bl.max_transfer_length; +bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer; +bs->bl.max_transfer = bs->file->bs->bl.max_transfer; bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment; bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment; bs->bl.max_iov = bs->file->bs->bl.max_iov; @@ -107,12 +107,10 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) error_propagate(errp, local_err); return; } -bs->bl.opt_transfer_length = -MAX(bs->bl.opt_transfer_length, -bs->backing->bs->bl.opt_transfer_length); -bs->bl.max_transfer_length = -MIN_NON_ZERO(bs->bl.max_transfer_length, - bs->backing->bs->bl.max_transfer_length); +bs->bl.opt_transfer = MAX(bs->bl.opt_transfer, + bs->backing->bs->bl.opt_transfer); +bs->bl.max_transfer =
[Qemu-block] [PATCH v2 15/17] block: Switch discard length bounds to byte-based
Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_discard and discard_alignment. Rename them, using 'pdiscard' as an aid to track which remaining discard interfaces need conversion, and so that the compiler will help us catch the change in semantics across any rebased code. In iscsi.c, sector_limits_lun2qemu() is no longer needed; and the BlockLimits type is now completely byte-based. Signed-off-by: Eric Blake--- v2: rebase nbd and iscsi limits across earlier improvements --- include/block/block_int.h | 21 +++-- block/io.c| 16 +--- block/iscsi.c | 19 ++- block/nbd.c | 2 +- qemu-img.c| 3 ++- 5 files changed, 33 insertions(+), 28 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index 2b45d57..0169019 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -324,18 +324,27 @@ struct BlockDriver { }; typedef struct BlockLimits { -/* maximum number of sectors that can be discarded at once */ -int max_discard; +/* maximum number of bytes that can be discarded at once (since it + * is signed, it must be < 2G, if set), should be multiple of + * pdiscard_alignment, but need not be power of 2. May be 0 if no + * inherent 32-bit limit */ +int32_t max_pdiscard; -/* optimal alignment for discard requests in sectors */ -int64_t discard_alignment; +/* optimal alignment for discard requests in bytes, must be power + * of 2, less than max_discard if that is set, and multiple of + * bs->request_alignment. May be 0 if bs->request_alignment is + * good enough */ +uint32_t pdiscard_alignment; /* maximum number of bytes that can zeroized at once (since it is - * signed, it must be < 2G, if set) */ + * signed, it must be < 2G, if set), should be multiple of + * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */ int32_t max_pwrite_zeroes; /* optimal alignment for write zeroes requests in bytes, must be - * power of 2, and less than max_pwrite_zeroes if that is set */ + * power of 2, less than max_pwrite_zeroes if that is set, and + * multiple of bs->request_alignment. May be 0 if + * bs->request_alignment is good enough */ uint32_t pwrite_zeroes_alignment; /* optimal transfer length in bytes (must be power of 2, and diff --git a/block/io.c b/block/io.c index 5e38ab4..0b5c40d 100644 --- a/block/io.c +++ b/block/io.c @@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, BDRV_TRACKED_DISCARD); bdrv_set_dirty(bs, sector_num, nb_sectors); -max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); +max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS, + BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret; int num = nb_sectors; +int discard_alignment = bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS; /* align request */ -if (bs->bl.discard_alignment && -num >= bs->bl.discard_alignment && -sector_num % bs->bl.discard_alignment) { -if (num > bs->bl.discard_alignment) { -num = bs->bl.discard_alignment; +if (discard_alignment && +num >= discard_alignment && +sector_num % discard_alignment) { +if (num > discard_alignment) { +num = discard_alignment; } -num -= sector_num % bs->bl.discard_alignment; +num -= sector_num % discard_alignment; } /* limit request size */ diff --git a/block/iscsi.c b/block/iscsi.c index 739d5ca..af9babf 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs) memset(iscsilun, 0, sizeof(IscsiLun)); } -static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun) -{ -int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1); - -return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0; -} - static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp) { /* We don't actually refresh here, but just return data queried in @@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp) } if (iscsilun->lbp.lbpu) { -if (iscsilun->bl.max_unmap < 0x) { -bs->bl.max_discard = -sector_limits_lun2qemu(iscsilun->bl.max_unmap, iscsilun); +if (iscsilun->bl.max_unmap < 0x / iscsilun->block_size) { +bs->bl.max_pdiscard = +iscsilun->bl.max_unmap * iscsilun->block_size; } -bs->bl.discard_alignment = -
[Qemu-block] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()
The raw block driver was blindly copying all limits from bs->file, even though: 1. the main bdrv_refresh_limits() already does this for many of gthe limits, and 2. blindly copying from the children can weaken any stricter limits that were already inherited from the backing dhain during the main bdrv_refresh_limits(). Also, the next patch is about to move .request_alignment into BlockLimits, and that is a limit that should NOT be copied from other layers in the BDS chain. Solve the issue by factoring out a new bdrv_merge_limits(), and using that function to properly merge limits when comparing two BlockDriverState objects. Signed-off-by: Eric Blake--- v2: new patch --- include/block/block.h | 1 + include/block/block_int.h | 4 ++-- include/qemu/typedefs.h | 1 + block/io.c| 31 +-- block/raw_bsd.c | 4 ++-- 5 files changed, 19 insertions(+), 22 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 733a8ec..c1d4648 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -262,6 +262,7 @@ int64_t bdrv_nb_sectors(BlockDriverState *bs); int64_t bdrv_getlength(BlockDriverState *bs); int64_t bdrv_get_allocated_file_size(BlockDriverState *bs); void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); +void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src); void bdrv_refresh_limits(BlockDriverState *bs, Error **errp); int bdrv_commit(BlockDriverState *bs); int bdrv_change_backing_file(BlockDriverState *bs, diff --git a/include/block/block_int.h b/include/block/block_int.h index 0169019..88ef826 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -323,7 +323,7 @@ struct BlockDriver { QLIST_ENTRY(BlockDriver) list; }; -typedef struct BlockLimits { +struct BlockLimits { /* maximum number of bytes that can be discarded at once (since it * is signed, it must be < 2G, if set), should be multiple of * pdiscard_alignment, but need not be power of 2. May be 0 if no @@ -364,7 +364,7 @@ typedef struct BlockLimits { /* maximum number of iovec elements */ int max_iov; -} BlockLimits; +}; typedef struct BdrvOpBlocker BdrvOpBlocker; diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index b113fcf..e6f72c2 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -14,6 +14,7 @@ typedef struct BdrvDirtyBitmap BdrvDirtyBitmap; typedef struct BlockBackend BlockBackend; typedef struct BlockBackendRootState BlockBackendRootState; typedef struct BlockDriverState BlockDriverState; +typedef struct BlockLimits BlockLimits; typedef struct BusClass BusClass; typedef struct BusState BusState; typedef struct CharDriverState CharDriverState; diff --git a/block/io.c b/block/io.c index 0b5c40d..c6c1f7b 100644 --- a/block/io.c +++ b/block/io.c @@ -67,6 +67,17 @@ static void bdrv_parent_drained_end(BlockDriverState *bs) } } +void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src) +{ +dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer); +dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer); +dst->opt_mem_alignment = MAX(dst->opt_mem_alignment, + src->opt_mem_alignment); +dst->min_mem_alignment = MAX(dst->min_mem_alignment, + src->min_mem_alignment); +dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov); +} + void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) { BlockDriver *drv = bs->drv; @@ -88,11 +99,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) error_propagate(errp, local_err); return; } -bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer; -bs->bl.max_transfer = bs->file->bs->bl.max_transfer; -bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment; -bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment; -bs->bl.max_iov = bs->file->bs->bl.max_iov; +bdrv_merge_limits(>bl, >file->bs->bl); } else { bs->bl.min_mem_alignment = 512; bs->bl.opt_mem_alignment = getpagesize(); @@ -107,19 +114,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) error_propagate(errp, local_err); return; } -bs->bl.opt_transfer = MAX(bs->bl.opt_transfer, - bs->backing->bs->bl.opt_transfer); -bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, - bs->backing->bs->bl.max_transfer); -bs->bl.opt_mem_alignment = -MAX(bs->bl.opt_mem_alignment, -bs->backing->bs->bl.opt_mem_alignment); -bs->bl.min_mem_alignment = -MAX(bs->bl.min_mem_alignment, -bs->backing->bs->bl.min_mem_alignment); -bs->bl.max_iov = -MIN(bs->bl.max_iov, -
[Qemu-block] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. qemu-iotests 77 is particularly sensitive to the fact that we can specify an artificial alignment override in blkdebug, and that override must continue to work even when limits are refreshed on an already open device. A later patch will be altering when bs->request_alignment defaults are set, so fall back to sector alignment if we have not inherited anything yet. Signed-off-by: Eric Blake--- v2: new patch --- block/blkdebug.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 20d25bd..1589fa7 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -37,6 +37,7 @@ typedef struct BDRVBlkdebugState { int state; int new_state; +int align; QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX]; QSIMPLEQ_HEAD(, BlkdebugRule) active_rules; @@ -382,9 +383,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags, } /* Set request alignment */ -align = qemu_opt_get_size(opts, "align", bs->request_alignment); -if (align > 0 && align < INT_MAX && !(align & (align - 1))) { -bs->request_alignment = align; +align = qemu_opt_get_size(opts, "align", + bs->request_alignment ?: BDRV_SECTOR_SIZE); +if (align > 0 && align < INT_MAX && is_power_of_2(align)) { +s->align = align; } else { error_setg(errp, "Invalid alignment"); ret = -EINVAL; @@ -720,6 +722,15 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options) bs->full_open_options = opts; } +static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp) +{ +BDRVBlkdebugState *s = bs->opaque; + +if (s->align) { +bs->request_alignment = s->align; +} +} + static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, Error **errp) { @@ -738,6 +749,7 @@ static BlockDriver bdrv_blkdebug = { .bdrv_getlength = blkdebug_getlength, .bdrv_truncate = blkdebug_truncate, .bdrv_refresh_filename = blkdebug_refresh_filename, +.bdrv_refresh_limits= blkdebug_refresh_limits, .bdrv_aio_readv = blkdebug_aio_readv, .bdrv_aio_writev= blkdebug_aio_writev, -- 2.5.5
[Qemu-block] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Now that all drivers have been updated to supply an override of request_alignment during their .bdrv_refresh_limits(), as needed, the block layer itself can defer setting the default alignment until part of the overall bdrv_refresh_limits(). Signed-off-by: Eric Blake--- block.c| 1 - block/io.c | 3 +++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index b350794..61fe1df 100644 --- a/block.c +++ b/block.c @@ -937,7 +937,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, goto fail_opts; } -bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512; bs->read_only = !(bs->open_flags & BDRV_O_RDWR); if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { diff --git a/block/io.c b/block/io.c index 383154f..c425ce8 100644 --- a/block/io.c +++ b/block/io.c @@ -78,6 +78,9 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) return; } +/* Default alignment based on whether driver has byte interface */ +bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512; + /* Take some limits from the children as a default */ if (bs->file) { bdrv_refresh_limits(bs->file->bs, _err); -- 2.5.5
[Qemu-block] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()
If the amount of data to read ends exactly on the total size of the bs, then we were wasting time creating a local qiov to read the data in preparation for what would normally be appending zeroes beyond the end, even though this corner case has nothing further to do. Signed-off-by: Eric BlakeReviewed-by: Kevin Wolf --- v2: no change, add R-b --- block/io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/io.c b/block/io.c index 9191772..383154f 100644 --- a/block/io.c +++ b/block/io.c @@ -1024,7 +1024,7 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, } max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); -if (bytes < max_bytes) { +if (bytes <= max_bytes) { ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0); } else if (max_bytes > 0) { QEMUIOVector local_qiov; -- 2.5.5
[Qemu-block] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Add a .bdrv_refresh_limits() to all four of our legacy devices that will always be sector-only (bochs, cloop, dmg, vvfat), in spite of their recent conversion to expose a byte interface. Signed-off-by: Eric Blake--- v2: new patch --- block/bochs.c | 7 ++- block/cloop.c | 7 ++- block/dmg.c | 7 ++- block/vvfat.c | 7 ++- 4 files changed, 24 insertions(+), 4 deletions(-) diff --git a/block/bochs.c b/block/bochs.c index 6c8d0f3..182c50b 100644 --- a/block/bochs.c +++ b/block/bochs.c @@ -105,7 +105,6 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags, int ret; bs->read_only = 1; // no write support yet -bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ ret = bdrv_pread(bs->file->bs, 0, , sizeof(bochs)); if (ret < 0) { @@ -189,6 +188,11 @@ fail: return ret; } +static void bochs_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ +} + static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num) { BDRVBochsState *s = bs->opaque; @@ -283,6 +287,7 @@ static BlockDriver bdrv_bochs = { .instance_size = sizeof(BDRVBochsState), .bdrv_probe= bochs_probe, .bdrv_open = bochs_open, +.bdrv_refresh_limits = bochs_refresh_limits, .bdrv_co_preadv = bochs_co_preadv, .bdrv_close= bochs_close, }; diff --git a/block/cloop.c b/block/cloop.c index ea5a92b..d574003 100644 --- a/block/cloop.c +++ b/block/cloop.c @@ -67,7 +67,6 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags, int ret; bs->read_only = 1; -bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ /* read header */ ret = bdrv_pread(bs->file->bs, 128, >block_size, 4); @@ -199,6 +198,11 @@ fail: return ret; } +static void cloop_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ +} + static inline int cloop_read_block(BlockDriverState *bs, int block_num) { BDRVCloopState *s = bs->opaque; @@ -280,6 +284,7 @@ static BlockDriver bdrv_cloop = { .instance_size = sizeof(BDRVCloopState), .bdrv_probe = cloop_probe, .bdrv_open = cloop_open, +.bdrv_refresh_limits = cloop_refresh_limits, .bdrv_co_preadv = cloop_co_preadv, .bdrv_close = cloop_close, }; diff --git a/block/dmg.c b/block/dmg.c index 06eb513..1e53cd8 100644 --- a/block/dmg.c +++ b/block/dmg.c @@ -439,7 +439,6 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, int ret; bs->read_only = 1; -bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ s->n_chunks = 0; s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL; @@ -547,6 +546,11 @@ fail: return ret; } +static void dmg_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ +} + static inline int is_sector_in_chunk(BDRVDMGState* s, uint32_t chunk_num, uint64_t sector_num) { @@ -720,6 +724,7 @@ static BlockDriver bdrv_dmg = { .instance_size = sizeof(BDRVDMGState), .bdrv_probe = dmg_probe, .bdrv_open = dmg_open, +.bdrv_refresh_limits = dmg_refresh_limits, .bdrv_co_preadv = dmg_co_preadv, .bdrv_close = dmg_close, }; diff --git a/block/vvfat.c b/block/vvfat.c index 6d2e21c..08b1aa3 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -1180,7 +1180,6 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags, bs->read_only = 0; } -bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ bs->total_sectors = cyls * heads * secs; if (init_directories(s, dirname, heads, secs, errp)) { @@ -1212,6 +1211,11 @@ fail: return ret; } +static void vvfat_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */ +} + static inline void vvfat_close_current_file(BDRVVVFATState *s) { if(s->current_mapping) { @@ -3049,6 +3053,7 @@ static BlockDriver bdrv_vvfat = { .bdrv_parse_filename= vvfat_parse_filename, .bdrv_file_open = vvfat_open, +.bdrv_refresh_limits= vvfat_refresh_limits, .bdrv_close = vvfat_close, .bdrv_co_preadv = vvfat_co_preadv, -- 2.5.5
[Qemu-block] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. Signed-off-by: Eric Blake--- v2: new patch --- block/qcow2.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index c40baca..393afa9 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -975,9 +975,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags, } bs->encrypted = 1; - -/* Encryption works on a sector granularity */ -bs->request_alignment = BDRV_SECTOR_SIZE; } s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */ @@ -1196,6 +1193,10 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVQcow2State *s = bs->opaque; +if (bs->encrypted) { +/* Encryption works on a sector granularity */ +bs->request_alignment = BDRV_SECTOR_SIZE; +} bs->bl.pwrite_zeroes_alignment = s->cluster_size; } -- 2.5.5
[Qemu-block] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()
We want to eventually stick request_alignment alongside other BlockLimits, but first, we must ensure it is populated at the same time as all other limits, rather than being a special case that is set only when a block is first opened. In this case, raw_probe_alignment() already did what we needed, so just fix its signature and wire it in correctly. Signed-off-by: Eric Blake--- v2: new patch --- block/raw-win32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/raw-win32.c b/block/raw-win32.c index fd23891..88382d9 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -222,7 +222,7 @@ static void raw_attach_aio_context(BlockDriverState *bs, } } -static void raw_probe_alignment(BlockDriverState *bs) +static void raw_probe_alignment(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; DWORD sectorsPerCluster, freeClusters, totalClusters, count; @@ -365,7 +365,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags, win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs)); } -raw_probe_alignment(bs); ret = 0; fail: qemu_opts_del(opts); @@ -550,6 +549,7 @@ BlockDriver bdrv_file = { .bdrv_needs_filename = true, .bdrv_parse_filename = raw_parse_filename, .bdrv_file_open = raw_open, +.bdrv_refresh_limits = raw_probe_alignment, .bdrv_close = raw_close, .bdrv_create= raw_create, .bdrv_has_zero_init = bdrv_has_zero_init_1, -- 2.5.5
[Qemu-block] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer
The function sector_limits_lun2qemu() returns a value in units of the block layer's 512-byte sector, and can be as large as 0x4000, which is much larger than the block layer's inherent limit of BDRV_REQUEST_MAX_SECTORS. The block layer already handles '0' as a synonym to the inherent limit, and it is nicer to return this value than it is to calculate an arbitrary maximum, for two reasons: we want to ensure that the block layer continues to special-case '0' as 'no limit beyond the inherent limits'; and we want to be able to someday expand the block layer to allow 64-bit limits, where auditing for uses of BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't artificially constraining iscsi to old block layer limits. Signed-off-by: Eric Blake--- v2: new patch --- block/iscsi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/iscsi.c b/block/iscsi.c index 7e78ade..4290e41 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1697,7 +1697,9 @@ static void iscsi_close(BlockDriverState *bs) static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun) { -return MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1); +int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1); + +return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0; } static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp) -- 2.5.5
[Qemu-block] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()
For symmetry with bdrv_aligned_preadv(), assert that the caller really has aligned things properly. This requires adding an align parameter, which is used now only in the new asserts, but will come in handy in a later patch that adds auto-fragmentation to the max transfer size, since that value need not always be a multiple of the alignment, and therefore must be rounded down. Signed-off-by: Eric Blake--- v2: new patch --- block/io.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/block/io.c b/block/io.c index 42e63f7..056a101 100644 --- a/block/io.c +++ b/block/io.c @@ -1242,7 +1242,7 @@ fail: */ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, BdrvTrackedRequest *req, int64_t offset, unsigned int bytes, -QEMUIOVector *qiov, int flags) +int64_t align, QEMUIOVector *qiov, int flags) { BlockDriver *drv = bs->drv; bool waited; @@ -1251,6 +1251,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, int64_t start_sector = offset >> BDRV_SECTOR_BITS; int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); +assert(is_power_of_2(align)); +assert((offset & (align - 1)) == 0); +assert((bytes & (align - 1)) == 0); assert(!qiov || bytes == qiov->size); assert((bs->open_flags & BDRV_O_NO_IO) == 0); assert(!(flags & ~BDRV_REQ_MASK)); @@ -1337,7 +1340,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs, memset(buf + head_padding_bytes, 0, zero_bytes); ret = bdrv_aligned_pwritev(bs, req, offset & ~(align - 1), align, - _qiov, + align, _qiov, flags & ~BDRV_REQ_ZERO_WRITE); if (ret < 0) { goto fail; @@ -1350,7 +1353,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs, if (bytes >= align) { /* Write the aligned part in the middle. */ uint64_t aligned_bytes = bytes & ~(align - 1); -ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, +ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, align, NULL, flags); if (ret < 0) { goto fail; @@ -1374,7 +1377,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs, bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_AFTER_TAIL); memset(buf, 0, bytes); -ret = bdrv_aligned_pwritev(bs, req, offset, align, +ret = bdrv_aligned_pwritev(bs, req, offset, align, align, _qiov, flags & ~BDRV_REQ_ZERO_WRITE); } fail: @@ -1499,7 +1502,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs, bytes = ROUND_UP(bytes, align); } -ret = bdrv_aligned_pwritev(bs, , offset, bytes, +ret = bdrv_aligned_pwritev(bs, , offset, bytes, align, use_local_qiov ? _qiov : qiov, flags); -- 2.5.5
[Qemu-block] [PATCH v2 04/17] nbd: Allow larger requests
The NBD layer was breaking up request at a limit of 2040 sectors (just under 1M) to cater to old qemu-nbd. But the server limit was raised to 32M in commit 2d8214885 to match the kernel, more than three years ago; and the upstream NBD Protocol is proposing documentation that without any explicit communication to state otherwise, a client should be able to safely assume that a 32M transaction will work. It is time to rely on the larger sizing, and any downstream distro that cares about maximum interoperability to older qemu-nbd servers can just tweak the value of #define NBD_MAX_SECTORS. Signed-off-by: Eric Blake--- v2: new patch --- include/block/nbd.h | 1 + block/nbd-client.c | 4 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index b86a976..36dde24 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -76,6 +76,7 @@ enum { /* Maximum size of a single READ/WRITE data buffer */ #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024) +#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE) ssize_t nbd_wr_syncv(QIOChannel *ioc, struct iovec *iov, diff --git a/block/nbd-client.c b/block/nbd-client.c index 4d13444..420bce8 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num, return -reply.error; } -/* qemu-nbd has a limit of slightly less than 1M per request. Try to - * remain aligned to 4K. */ -#define NBD_MAX_SECTORS 2040 - int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { -- 2.5.5
[Qemu-block] [PATCH v2 00/17] Byte-based block limits
BlockLimits is currently an ugly mix of byte limits vs. sector limits. Unify it. Fix some bugs I found in bdrv_aligned_preadv() while at it. Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv() to be byte-based (commit 3de06b2 on his vmstate branch at the time of this email, but that gets rebased): https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html Trivial contextual conflict in nbd.h with the pull request Paolo will soon be posting (both series add a #define near the same line; resolution is to add both): https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg0.html Also available as a tag at: git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2 Since v1: - drop things already done in Kevin's work - rebase - split out lots of cleanup work to bdrv_refresh_limits() so that qemu-iotests does not gain any problems on 77 001/17:[down] 'block: Tighter assertions on bdrv_aligned_pwritev()' 002/17:[down] 'block: Document supported flags during bdrv_aligned_preadv()' 003/17:[down] 'block: Fix harmless off-by-one in bdrv_aligned_preadv()' 004/17:[down] 'nbd: Allow larger requests' 005/17:[down] 'nbd: Advertise realistic limits to block layer' 006/17:[down] 'iscsi: Advertise realistic limits to block layer' 007/17:[down] 'block: Give nonzero result to blk_get_max_transfer_length()' 008/17:[down] 'blkdebug: Set request_alignment during .bdrv_refresh_limits()' 009/17:[down] 'iscsi: Set request_alignment during .bdrv_refresh_limits()' 010/17:[down] 'qcow2: Set request_alignment during .bdrv_refresh_limits()' 011/17:[down] 'raw-win32: Set request_alignment during .bdrv_refresh_limits()' 012/17:[down] 'block: Set request_alignment during .bdrv_refresh_limits()' 013/17:[down] 'block: Set default request_alignment during bdrv_refresh_limits()' 014/17:[0061] [FC] 'block: Switch transfer length bounds to byte-based' 015/17:[0008] [FC] 'block: Switch discard length bounds to byte-based' 016/17:[down] 'block: Split bdrv_merge_limits() from bdrv_refresh_limits()' 017/17:[0044] [FC] 'block: Move request_alignment into BlockLimit' Eric Blake (17): block: Tighter assertions on bdrv_aligned_pwritev() block: Document supported flags during bdrv_aligned_preadv() block: Fix harmless off-by-one in bdrv_aligned_preadv() nbd: Allow larger requests nbd: Advertise realistic limits to block layer iscsi: Advertise realistic limits to block layer block: Give nonzero result to blk_get_max_transfer_length() blkdebug: Set request_alignment during .bdrv_refresh_limits() iscsi: Set request_alignment during .bdrv_refresh_limits() qcow2: Set request_alignment during .bdrv_refresh_limits() raw-win32: Set request_alignment during .bdrv_refresh_limits() block: Set request_alignment during .bdrv_refresh_limits() block: Set default request_alignment during bdrv_refresh_limits() block: Switch transfer length bounds to byte-based block: Switch discard length bounds to byte-based block: Split bdrv_merge_limits() from bdrv_refresh_limits() block: Move request_alignment into BlockLimit include/block/block.h | 1 + include/block/block_int.h | 48 ++--- include/block/nbd.h| 1 + include/qemu/typedefs.h| 1 + include/sysemu/block-backend.h | 2 +- block.c| 3 +- block/blkdebug.c | 18 ++-- block/block-backend.c | 9 ++-- block/bochs.c | 7 ++- block/cloop.c | 7 ++- block/dmg.c| 7 ++- block/io.c | 96 +++--- block/iscsi.c | 40 +- block/nbd-client.c | 4 -- block/nbd.c| 4 +- block/qcow2.c | 7 +-- block/raw-posix.c | 19 + block/raw-win32.c | 10 ++--- block/raw_bsd.c| 4 +- block/vvfat.c | 7 ++- hw/block/virtio-blk.c | 10 ++--- hw/scsi/scsi-generic.c | 15 --- qemu-img.c | 9 ++-- 23 files changed, 195 insertions(+), 134 deletions(-) -- 2.5.5
[Qemu-block] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()
We don't pass any flags on to drivers to handle. Tighten an assert to explain why we pass 0 to bdrv_driver_preadv(), and add some comments on things to be aware of if we want to turn on per-BDS BDRV_REQ_FUA support during reads in the future. Also, document that we may want to consider using unmap during copy-on-read operations where the read is all zeroes. Signed-off-by: Eric Blake--- v2: retitle from 'Honor flags during bdrv_aligned_preadv()', and change scope of patch --- block/io.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/block/io.c b/block/io.c index 056a101..9191772 100644 --- a/block/io.c +++ b/block/io.c @@ -933,6 +933,9 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, if (drv->bdrv_co_pwrite_zeroes && buffer_is_zero(bounce_buffer, iov.iov_len)) { +/* FIXME: Should we (perhaps conditionally) be setting + * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy + * that still correctly reads as zero? */ ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0); } else { /* This does not change the data on the disk, it is not necessary @@ -975,7 +978,12 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, assert((bytes & (align - 1)) == 0); assert(!qiov || bytes == qiov->size); assert((bs->open_flags & BDRV_O_NO_IO) == 0); -assert(!(flags & ~BDRV_REQ_MASK)); + +/* TODO: We would need a per-BDS .supported_read_flags and + * potential fallback support, if we ever implement any read flags + * to pass through to drivers. For now, there aren't any + * passthrough flags. */ +assert(!(flags & ~(BDRV_REQ_NO_SERIALISING | BDRV_REQ_COPY_ON_READ))); /* Handle Copy on Read and associated serialisation */ if (flags & BDRV_REQ_COPY_ON_READ) { -- 2.5.5
Re: [Qemu-block] [Qemu-devel] [PATCH] macio: Use blk_drain instead of blk_drain_all
On 06/13/2016 09:21 PM, Fam Zheng wrote: > On Mon, 06/13 17:33, John Snow wrote: >> >> >> On 06/12/2016 02:56 AM, Fam Zheng wrote: >>> We only care about the associated backend, so blk_drain is more >>> appropriate here. >>> >>> Signed-off-by: Fam Zheng>>> --- >>> hw/ide/macio.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c >>> index 78c10a0..a8c7321 100644 >>> --- a/hw/ide/macio.c >>> +++ b/hw/ide/macio.c >>> @@ -400,7 +400,7 @@ static void pmac_ide_flush(DBDMA_io *io) >>> IDEState *s = idebus_active_if(>bus); >>> >>> if (s->bus->dma->aiocb) { >>> -blk_drain_all(); >>> +blk_drain(s->blk); >>> } >>> } >>> >>> >> >> Reviewed-by: John Snow >> >> Shall I take this through my tree? > > I think that MAINTAINERS says so. :) > > Fam > For now, then: Thanks, applied to my IDE tree: https://github.com/jnsnow/qemu/commits/ide https://github.com/jnsnow/qemu.git --js
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
On 06/13/2016 06:19 AM, Paolo Bonzini wrote: > > > On 12/05/2016 00:39, Eric Blake wrote: >> We have a few bugs in how we handle invalid client commands: >> >> - A client can send an NBD_CMD_DISC where from + len overflows, >> convincing us to reply with an error and stay connected, even >> though the protocol requires us to silently disconnect. Fix by >> hoisting the special case sooner. >> > It's simpler to always set req->complete. Putting everything together: > > diff --git a/nbd/server.c b/nbd/server.c > @@ -1213,12 +1218,9 @@ static void nbd_trip(void *opaque) > LOG("invalid request type (%" PRIu32 ") received", request.type); > reply.error = EINVAL; > error_reply: > -/* We must disconnect after replying with an error to > - * NBD_CMD_READ, since we choose not to send bogus filler > - * data; likewise after NBD_CMD_WRITE if we did not read the > - * payload. */ > -if (nbd_co_send_reply(req, , 0) < 0 || command == NBD_CMD_READ > || > -(command == NBD_CMD_WRITE && !req->complete)) { > +/* We must disconnect after NBD_CMD_WRITE if we did not > + * read the payload. */ > +if (nbd_co_send_reply(req, , 0) < 0 || !req->complete)) { This doesn't even compile (too many ')'). I assume you'll fix that before your actual pull request goes out. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/3] coccinelle: Clean up error checks and return value variables
On Tue, Jun 14, 2016 at 11:03:20AM +0200, Markus Armbruster wrote: [...] > > * Manual fixups > > With the commit message of 3/3 amended, series > Reviewed-by: Markus Armbruster> > My other suggested touch ups are optional. If you don't object, I'll do > them, and take the result through my error-next branch. I'm OK with the changes you suggested. Thanks! -- Eduardo
Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/3] coccinelle: Clean up error checks and return value variables
Markus Armbrusterwrites: > Eduardo Habkost writes: > >> Changes v1 -> v2: >> * The Coccinelle scripts were simplified by using "when" >> constraints to detect when a variable is not used elsewhere >> inside the function. >> * Added script to remove unnecessary variables for function >> return value. >> * Coccinelle scripts added to scripts/coccinelle. >> >> Changes v2 -> v3 on patch 2/3: >> * Remove unused metavariable from script >> * Do changes only if errp is not touched before the error_setg() >>call (so we are sure *errp is not set and error_setg() won't >>abort) >> * Changes dropped from v2: >>* block.c:bdrv_create_co_entry() >>* block.c:bdrv_create_file() >>* blockdev.c:qmp_blockdev_mirror() >> >> Changes v2 -> v3 on patch 3/3: >> * Not a RFC anymore >> * Used --keep-comments option >> * Instead of using: >> - VAR = E; >> + return E; >> use: >> - VAR = >> + return >> E >> This makes Coccinelle keep the existing formatting on some >> cases. > > Neat trick. > >> * Manual fixups > > With the commit message of 3/3 amended, series > Reviewed-by: Markus Armbruster > > My other suggested touch ups are optional. If you don't object, I'll do > them, and take the result through my error-next branch. Applied to error-next, thanks!
Re: [Qemu-block] [PATCH v2] rbd:change error_setg() to error_setg_errno()
On Tue, Jun 14, 2016 at 8:12 PM, Max Reitzwrote: > On 09.05.2016 09:51, Vikhyat Umrao wrote: > > Ceph RBD block driver does not use error_setg_errno() where > > it is possible to use. This patch replaces error_setg() > > from error_setg_errno(). > > > > Signed-off-by: Vikhyat Umrao > > --- > > block/rbd.c | 38 +++--- > > 1 file changed, 23 insertions(+), 15 deletions(-) > > Thanks, applied to my block tree: > > https://github.com/XanClic/qemu/commits/block > > Thanks Max. > Max > >
Re: [Qemu-block] [PATCH v3 2/2] block: export LUKS specific data to qemu-img info
On 14.06.2016 18:24, Daniel P. Berrange wrote: > The qemu-img info command has the ability to expose format > specific metadata about volumes. Wire up this facility for > the LUKS driver to report on cipher configuration and key > slot usage. > > $ qemu-img info ~/VirtualMachines/demo.luks > image: /home/berrange/VirtualMachines/demo.luks > file format: luks > virtual size: 98M (102760448 bytes) > disk size: 100M > encrypted: yes > Format specific information: > ivgen alg: plain64 > hash alg: sha1 > cipher alg: aes-128 > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > cipher mode: xts > slots: > [0]: > active: true > iters: 572706 > key offset: 4096 > stripes: 4000 > [1]: > active: false > key offset: 135168 > [2]: > active: false > key offset: 266240 > [3]: > active: false > key offset: 397312 > [4]: > active: false > key offset: 528384 > [5]: > active: false > key offset: 659456 > [6]: > active: false > key offset: 790528 > [7]: > active: false > key offset: 921600 > payload offset: 2097152 > master key iters: 142375 > > One somewhat undesirable artifact is that the data fields are > printed out in (apparently) random order. This will be addressed > later by changing the way the block layer pretty-prints the > image specific data. > > Signed-off-by: Daniel P. Berrange> --- > block/crypto.c | 49 + > qapi/block-core.json | 6 +- > 2 files changed, 54 insertions(+), 1 deletion(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v3 2/2] block: export LUKS specific data to qemu-img info
The qemu-img info command has the ability to expose format specific metadata about volumes. Wire up this facility for the LUKS driver to report on cipher configuration and key slot usage. $ qemu-img info ~/VirtualMachines/demo.luks image: /home/berrange/VirtualMachines/demo.luks file format: luks virtual size: 98M (102760448 bytes) disk size: 100M encrypted: yes Format specific information: ivgen alg: plain64 hash alg: sha1 cipher alg: aes-128 uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 cipher mode: xts slots: [0]: active: true iters: 572706 key offset: 4096 stripes: 4000 [1]: active: false key offset: 135168 [2]: active: false key offset: 266240 [3]: active: false key offset: 397312 [4]: active: false key offset: 528384 [5]: active: false key offset: 659456 [6]: active: false key offset: 790528 [7]: active: false key offset: 921600 payload offset: 2097152 master key iters: 142375 One somewhat undesirable artifact is that the data fields are printed out in (apparently) random order. This will be addressed later by changing the way the block layer pretty-prints the image specific data. Signed-off-by: Daniel P. Berrange--- block/crypto.c | 49 + qapi/block-core.json | 6 +- 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/block/crypto.c b/block/crypto.c index 758e14e..6fe2521 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -565,6 +565,53 @@ static int block_crypto_create_luks(const char *filename, filename, opts, errp); } +static int block_crypto_get_info_luks(BlockDriverState *bs, + BlockDriverInfo *bdi) +{ +BlockDriverInfo subbdi; +int ret; + +ret = bdrv_get_info(bs->file->bs, ); +if (ret != 0) { +return ret; +} + +bdi->unallocated_blocks_are_zero = false; +bdi->can_write_zeroes_with_unmap = false; +bdi->cluster_size = subbdi.cluster_size; + +return 0; +} + +static ImageInfoSpecific * +block_crypto_get_specific_info_luks(BlockDriverState *bs) +{ +BlockCrypto *crypto = bs->opaque; +ImageInfoSpecific *spec_info; +QCryptoBlockInfo *info; + +info = qcrypto_block_get_info(crypto->block, NULL); +if (!info) { +return NULL; +} +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) { +qapi_free_QCryptoBlockInfo(info); +return NULL; +} + +spec_info = g_new(ImageInfoSpecific, 1); +spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS; +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1); +*spec_info->u.luks.data = info->u.luks; + +/* Blank out pointers we've just stolen to avoid double free */ +memset(>u.luks, 0, sizeof(info->u.luks)); + +qapi_free_QCryptoBlockInfo(info); + +return spec_info; +} + BlockDriver bdrv_crypto_luks = { .format_name= "luks", .instance_size = sizeof(BlockCrypto), @@ -578,6 +625,8 @@ BlockDriver bdrv_crypto_luks = { .bdrv_co_readv = block_crypto_co_readv, .bdrv_co_writev = block_crypto_co_writev, .bdrv_getlength = block_crypto_getlength, +.bdrv_get_info = block_crypto_get_info_luks, +.bdrv_get_specific_info = block_crypto_get_specific_info_luks, }; static void block_crypto_init(void) diff --git a/qapi/block-core.json b/qapi/block-core.json index 98a20d2..35454e9 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -85,7 +85,11 @@ { 'union': 'ImageInfoSpecific', 'data': { 'qcow2': 'ImageInfoSpecificQCow2', - 'vmdk': 'ImageInfoSpecificVmdk' + 'vmdk': 'ImageInfoSpecificVmdk', + # If we need to add block driver specific parameters for + # LUKS in future, then we'll subclass QCryptoBlockInfoLUKS + # to define a ImageInfoSpecificLUKS + 'luks': 'QCryptoBlockInfoLUKS' } } ## -- 2.5.5
Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray
On 14.06.2016 17:54, John Snow wrote: > > > On 06/14/2016 09:19 AM, Max Reitz wrote: >> On 10.06.2016 23:59, John Snow wrote: >>> If a device still has an attached BDS because the medium has not yet >>> been removed, we will be unable to migrate to a new host because >>> blk_flush will return an error for that backend. >>> >>> Replace the call to blk_is_available to blk_is_inserted to weaken >>> the check and allow flushes from the backend to work, while still >>> disallowing flushes from the frontend/device model to work. >>> >>> This fixes a regression present in 2.6.0 caused by the following commit: >>> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf >>> block: Move some bdrv_*_all() functions to BB >>> >>> Signed-off-by: John Snow>>> --- >>> block/block-backend.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I >> guess you exclude them here because you specifically want to fix the >> issue mentioned in the commit message, but then we could just make >> blk_flush_all() ignore an -ENOMEDIUM. > > Yeah, I didn't investigate the full path. Just making the minimal fixes. > Is there a concern that this may still leave certain pathways broken > when the CDROM tray is open? > > I don't know of any immediately without digging again. > >> >> I personally think we should make all blk_*flush() functions use >> blk_is_inserted() instead of blk_is_available(). As we have discussed on >> IRC, there are probably not that many cases a guest can flush a medium >> in an open tray anyway (because the main use case are read-only >> CD-ROMs), and even if so, that wouldn't change any data, so even if the >> guest can actually flush something on an open tray, I don't think anyone >> would complain. >> >> Max >> > > I have difficulty making pragmatic arguments when purity is at stake, > but I've already wandered outside of my device model, so I will defer to > your judgment. > >>> diff --git a/block/block-backend.c b/block/block-backend.c >>> index 34500e6..d1e875e 100644 >>> --- a/block/block-backend.c >>> +++ b/block/block-backend.c >>> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk) >>> >>> int blk_flush(BlockBackend *blk) >>> { >>> -if (!blk_is_available(blk)) { >>> +if (!blk_is_inserted(blk)) { >>> return -ENOMEDIUM; >>> } >>> >>> >> >> > > Is this a NACK unless I attempt to address the wider design issue? I just don't see a point in using blk_is_inserted() here but blk_is_available() in the other blk_*flush() functions. If blk_is_inserted() is correct for blk_flush(), it should be correct for blk_co_flush() and blk_aio_flush(), too. Maybe I should emphasize that I decided between is_available() and is_inserted() basically on what felt right to me. There's not really that much research behind it, so changing it is completely fine. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v4 00/11] nbd: tighter protocol compliance
On 13/06/2016 18:49, Eric Blake wrote: >>> >> 004/11:[] [--] 'nbd: Improve server handling of bogus commands' >> > >> > Applied with some changes, see reply to individual patch. > Not sure I agree with all of your comments on that patch regarding > behavior on read errors, but further changes can be followup patches. > > >> > I'll need a v5 of just patch 8 and patch 9; I'm queuing the rest and >> > will send a pull request later this week. > Do you have a git tree that I can rebase on top of? Just wait tomorrow. :) Paolo
[Qemu-block] [PATCH v3 0/2] Report format specific info for LUKS block driver
This is a followup to: v1: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01723.html v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03642.html The 'qemu-img info' tool has ability to print format specific information, eg with qcow2 it reports two extra items: $ qemu-img info ~/VirtualMachines/demo.qcow2 image: /home/berrange/VirtualMachines/demo.qcow2 file format: qcow2 virtual size: 3.0G (3221225472 bytes) disk size: 140K cluster_size: 65536 Format specific information: compat: 0.10 refcount bits: 16 This is not currently wired up for the LUKS driver. This patch series adds that support so that we can report useful data about the LUKS volume such as the crypto algorithm choices, key slot usage and other volume metadata. The first patch extends the crypto API to allow querying of the format specific metadata The second patches extends the block API to allow the LUKS driver to report the format specific metadata. $ qemu-img info ~/VirtualMachines/demo.luks image: /home/berrange/VirtualMachines/demo.luks file format: luks virtual size: 98M (102760448 bytes) disk size: 100M encrypted: yes Format specific information: ivgen alg: plain64 hash alg: sha1 cipher alg: aes-128 uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 cipher mode: xts slots: [0]: active: true iters: 572706 key offset: 4096 stripes: 4000 [1]: active: false key offset: 135168 [2]: active: false key offset: 266240 [3]: active: false key offset: 397312 [4]: active: false key offset: 528384 [5]: active: false key offset: 659456 [6]: active: false key offset: 790528 [7]: active: false key offset: 921600 payload offset: 2097152 master key iters: 142375 Technically most of the code changes here are in the crypto layer, rather than the block layer. I'm fine with both patches going through the block maintainer tree, or can submit a both patches myself as, for sake of simplicity of merge. Changed in v3: - Do full struct copy instead of field-by-field copy (Max) - Simplify handling of linked list pointers (Max) - Use g_strndup with uuid to guarantee null termination (Max) - Misc typos (Max) Changed in v2: - Drop patches related to creating a text output visitor to format the ImageInfoSpecific data. This will be continued in a separate patch series - Fix key offset to be in bytes instead of sectors - Drop the duplicated ImageInfoSpecificLUKS type and just directly use QCryptoBlockInfoLUKS type in block layer - Skip reporting stripes/iters if keyslot is inactive - Add missing QAPI schema docs Daniel P. Berrange (2): crypto: add support for querying parameters for block encryption block: export LUKS specific data to qemu-img info block/crypto.c | 49 crypto/block-luks.c| 67 crypto/block.c | 17 +++ crypto/blockpriv.h | 4 +++ include/crypto/block.h | 16 +++ qapi/block-core.json | 6 +++- qapi/crypto.json | 76 ++ 7 files changed, 234 insertions(+), 1 deletion(-) -- 2.5.5
Re: [Qemu-block] [PATCH v3 1/2] crypto: add support for querying parameters for block encryption
On 14.06.2016 18:24, Daniel P. Berrange wrote: > When creating new block encryption volumes, we accept a list of > parameters to control the formatting process. It is useful to > be able to query what those parameters were for existing block > devices. Add a qcrypto_block_get_info() method which returns a > QCryptoBlockInfo instance to report this data. > > Signed-off-by: Daniel P. Berrange> --- > crypto/block-luks.c| 67 > crypto/block.c | 17 +++ > crypto/blockpriv.h | 4 +++ > include/crypto/block.h | 16 +++ > qapi/crypto.json | 76 > ++ > 5 files changed, 180 insertions(+) Reviewed-by: Max Reitz (I'll be waiting whether Eric wants to comment) signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v3 1/2] crypto: add support for querying parameters for block encryption
When creating new block encryption volumes, we accept a list of parameters to control the formatting process. It is useful to be able to query what those parameters were for existing block devices. Add a qcrypto_block_get_info() method which returns a QCryptoBlockInfo instance to report this data. Signed-off-by: Daniel P. Berrange--- crypto/block-luks.c| 67 crypto/block.c | 17 +++ crypto/blockpriv.h | 4 +++ include/crypto/block.h | 16 +++ qapi/crypto.json | 76 ++ 5 files changed, 180 insertions(+) diff --git a/crypto/block-luks.c b/crypto/block-luks.c index 63649f1..6e940fb 100644 --- a/crypto/block-luks.c +++ b/crypto/block-luks.c @@ -201,6 +201,15 @@ QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSHeader) != 592); struct QCryptoBlockLUKS { QCryptoBlockLUKSHeader header; + +/* Cache parsed versions of what's in header fields, + * as we can't rely on QCryptoBlock.cipher being + * non-NULL */ +QCryptoCipherAlgorithm cipher_alg; +QCryptoCipherMode cipher_mode; +QCryptoIVGenAlgorithm ivgen_alg; +QCryptoHashAlgorithm ivgen_hash_alg; +QCryptoHashAlgorithm hash_alg; }; @@ -835,6 +844,12 @@ qcrypto_block_luks_open(QCryptoBlock *block, block->payload_offset = luks->header.payload_offset * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +luks->cipher_alg = cipheralg; +luks->cipher_mode = ciphermode; +luks->ivgen_alg = ivalg; +luks->ivgen_hash_alg = ivhash; +luks->hash_alg = hash; + g_free(masterkey); g_free(password); @@ -1250,6 +1265,12 @@ qcrypto_block_luks_create(QCryptoBlock *block, goto error; } +luks->cipher_alg = luks_opts.cipher_alg; +luks->cipher_mode = luks_opts.cipher_mode; +luks->ivgen_alg = luks_opts.ivgen_alg; +luks->ivgen_hash_alg = luks_opts.ivgen_hash_alg; +luks->hash_alg = luks_opts.hash_alg; + memset(masterkey, 0, luks->header.key_bytes); g_free(masterkey); memset(slotkey, 0, luks->header.key_bytes); @@ -1284,6 +1305,51 @@ qcrypto_block_luks_create(QCryptoBlock *block, } +static int qcrypto_block_luks_get_info(QCryptoBlock *block, + QCryptoBlockInfo *info, + Error **errp) +{ +QCryptoBlockLUKS *luks = block->opaque; +QCryptoBlockInfoLUKSSlot *slot; +QCryptoBlockInfoLUKSSlotList *slots = NULL, **prev = >u.luks.slots; +size_t i; + +info->u.luks.cipher_alg = luks->cipher_alg; +info->u.luks.cipher_mode = luks->cipher_mode; +info->u.luks.ivgen_alg = luks->ivgen_alg; +if (info->u.luks.ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) { +info->u.luks.has_ivgen_hash_alg = true; +info->u.luks.ivgen_hash_alg = luks->ivgen_hash_alg; +} +info->u.luks.hash_alg = luks->hash_alg; +info->u.luks.payload_offset = block->payload_offset; +info->u.luks.master_key_iters = luks->header.master_key_iterations; +info->u.luks.uuid = g_strndup((const char *)luks->header.uuid, + sizeof(luks->header.uuid)); + +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) { +slots = g_new0(QCryptoBlockInfoLUKSSlotList, 1); +*prev = slots; + +slots->value = slot = g_new0(QCryptoBlockInfoLUKSSlot, 1); +slot->active = luks->header.key_slots[i].active == +QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED; +slot->key_offset = luks->header.key_slots[i].key_offset + * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; +if (slot->active) { +slot->has_iters = true; +slot->iters = luks->header.key_slots[i].iterations; +slot->has_stripes = true; +slot->stripes = luks->header.key_slots[i].stripes; +} + +prev = >next; +} + +return 0; +} + + static void qcrypto_block_luks_cleanup(QCryptoBlock *block) { g_free(block->opaque); @@ -1321,6 +1387,7 @@ qcrypto_block_luks_encrypt(QCryptoBlock *block, const QCryptoBlockDriver qcrypto_block_driver_luks = { .open = qcrypto_block_luks_open, .create = qcrypto_block_luks_create, +.get_info = qcrypto_block_luks_get_info, .cleanup = qcrypto_block_luks_cleanup, .decrypt = qcrypto_block_luks_decrypt, .encrypt = qcrypto_block_luks_encrypt, diff --git a/crypto/block.c b/crypto/block.c index da60eba..be823ee 100644 --- a/crypto/block.c +++ b/crypto/block.c @@ -105,6 +105,23 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options, } +QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block, + Error **errp) +{ +QCryptoBlockInfo *info = g_new0(QCryptoBlockInfo, 1); + +info->format = block->format; + +if (block->driver->get_info && +block->driver->get_info(block, info, errp) < 0) { +g_free(info); +return NULL; +}
Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray
On 06/14/2016 09:19 AM, Max Reitz wrote: > On 10.06.2016 23:59, John Snow wrote: >> If a device still has an attached BDS because the medium has not yet >> been removed, we will be unable to migrate to a new host because >> blk_flush will return an error for that backend. >> >> Replace the call to blk_is_available to blk_is_inserted to weaken >> the check and allow flushes from the backend to work, while still >> disallowing flushes from the frontend/device model to work. >> >> This fixes a regression present in 2.6.0 caused by the following commit: >> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf >> block: Move some bdrv_*_all() functions to BB >> >> Signed-off-by: John Snow>> --- >> block/block-backend.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I > guess you exclude them here because you specifically want to fix the > issue mentioned in the commit message, but then we could just make > blk_flush_all() ignore an -ENOMEDIUM. Yeah, I didn't investigate the full path. Just making the minimal fixes. Is there a concern that this may still leave certain pathways broken when the CDROM tray is open? I don't know of any immediately without digging again. > > I personally think we should make all blk_*flush() functions use > blk_is_inserted() instead of blk_is_available(). As we have discussed on > IRC, there are probably not that many cases a guest can flush a medium > in an open tray anyway (because the main use case are read-only > CD-ROMs), and even if so, that wouldn't change any data, so even if the > guest can actually flush something on an open tray, I don't think anyone > would complain. > > Max > I have difficulty making pragmatic arguments when purity is at stake, but I've already wandered outside of my device model, so I will defer to your judgment. >> diff --git a/block/block-backend.c b/block/block-backend.c >> index 34500e6..d1e875e 100644 >> --- a/block/block-backend.c >> +++ b/block/block-backend.c >> @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk) >> >> int blk_flush(BlockBackend *blk) >> { >> -if (!blk_is_available(blk)) { >> +if (!blk_is_inserted(blk)) { >> return -ENOMEDIUM; >> } >> >> > > Is this a NACK unless I attempt to address the wider design issue?
Re: [Qemu-block] [PATCH v3 0/5] block/mirror: Fix (?) target backing BDS
On 10.06.2016 20:57, Max Reitz wrote: > Issue #1: If the target image does not have a backing BDS before mirror > completion, qemu tries really hard to give it a backing BDS. If the > source has a backing BDS, it will actually always "succeed". > In some cases, the target is not supposed to have a backing BDS, though > (absolute-paths: because of sync=full; existing: because the target > image does not have a backing file; blockdev-mirror: because of an > explicit "backing": ""). Then, this is pretty bad behavior. > > This should generally not change the target's visible data, but it still > is ugly. > > Issue #2: Currently the backing chain of the target is basically opened > using bdrv_open_backing_file() (except for sometimesâ„¢). This results in > multiple BDSs for a single physical file, which is bad. In most use > cases, this is only temporary, but it still is bad. > > If we can reuse the existing backing chain of the source (which is with > drive-mirror in "absolute-paths" mode), we should just do so. Following encouragement from Kevin on IRC, and apparently his acceptance of my commit message proposal for patch 3, I have applied the series to my block branch with said commit message added to patch 3 and the superfluous imports removed from patch 4 (thanks, Fam!). Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info
On 14.06.2016 12:56, Daniel P. Berrange wrote: > The qemu-img info command has the ability to expose format > specific metadata about volumes. Wire up this facility for > the LUKS driver to report on cipher configuration and key > slot usage. > > $ qemu-img info ~/VirtualMachines/demo.luks > image: /home/berrange/VirtualMachines/demo.luks > file format: luks > virtual size: 98M (102760448 bytes) > disk size: 100M > encrypted: yes > Format specific information: > ivgen alg: plain64 > hash alg: sha1 > cipher alg: aes-128 > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > cipher mode: xts > slots: > [0]: > active: true > iters: 572706 > key offset: 4096 > stripes: 4000 > [1]: > active: false > key offset: 135168 > [2]: > active: false > key offset: 266240 > [3]: > active: false > key offset: 397312 > [4]: > active: false > key offset: 528384 > [5]: > active: false > key offset: 659456 > [6]: > active: false > key offset: 790528 > [7]: > active: false > key offset: 921600 > payload offset: 2097152 > master key iters: 142375 > > One somewhat undesirable artifact is that the data fields are > printed out in (apparently) random order. This will be addressed > later by changing the way the block layer pretty-prints the > image specific data. > > Signed-off-by: Daniel P. Berrange> --- > block/crypto.c | 59 > > qapi/block-core.json | 7 ++- > 2 files changed, 65 insertions(+), 1 deletion(-) > > diff --git a/block/crypto.c b/block/crypto.c > index 758e14e..823572f 100644 > --- a/block/crypto.c > +++ b/block/crypto.c > @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char *filename, > filename, opts, errp); > } > > +static int block_crypto_get_info_luks(BlockDriverState *bs, > + BlockDriverInfo *bdi) > +{ > +BlockDriverInfo subbdi; > +int ret; > + > +ret = bdrv_get_info(bs->file->bs, ); > +if (ret != 0) { > +return ret; > +} > + > +bdi->unallocated_blocks_are_zero = false; > +bdi->can_write_zeroes_with_unmap = false; > +bdi->cluster_size = subbdi.cluster_size; > + > +return 0; > +} > + > +static ImageInfoSpecific * > +block_crypto_get_specific_info_luks(BlockDriverState *bs) > +{ > +BlockCrypto *crypto = bs->opaque; > +ImageInfoSpecific *spec_info; > +QCryptoBlockInfo *info; > + > +info = qcrypto_block_get_info(crypto->block, NULL); > +if (!info) { > +return NULL; > +} > +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) { > +qapi_free_QCryptoBlockInfo(info); > +return NULL; > +} > + > +spec_info = g_new(ImageInfoSpecific, 1); > +spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS; One space too many. > +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1); > +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg; > +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode; > +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg; > +spec_info->u.luks.data->has_ivgen_hash_alg = > +info->u.luks.has_ivgen_hash_alg; > +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg; > +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg; > +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset; > +spec_info->u.luks.data->master_key_iters = info->u.luks.master_key_iters; > +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid); > + > +/* Steal the pointer instead of bothering to copy */ > +spec_info->u.luks.data->slots = info->u.luks.slots; > +info->u.luks.slots = NULL; Why not just steal everything by *spec_info->u.luks.data = info->u.luks and then making sure the qapi_free() call below won't free anything in there with a memset(>u.luks, 0, sizeof(info->u.luks))? > + > +qapi_free_QCryptoBlockInfo(info); > + > +return spec_info; > +} > + > BlockDriver bdrv_crypto_luks = { > .format_name= "luks", > .instance_size = sizeof(BlockCrypto), > @@ -578,6 +635,8 @@ BlockDriver bdrv_crypto_luks = { > .bdrv_co_readv = block_crypto_co_readv, > .bdrv_co_writev = block_crypto_co_writev, > .bdrv_getlength = block_crypto_getlength, > +.bdrv_get_info = block_crypto_get_info_luks, > +.bdrv_get_specific_info = block_crypto_get_specific_info_luks, > }; > > static void block_crypto_init(void) > diff --git
Re: [Qemu-block] [Qemu-devel] [PATCH v7 08/15] block: Simplify block_set_io_throttle
Eric Blakewrites: > Now that we can support boxed commands, use it to greatly > reduce the number of parameters (and likelihood of getting > out of sync) when adjusting throttle parameters. > > Signed-off-by: Eric Blake > > --- > v7: new patch > --- > qapi/block-core.json | 20 -- > blockdev.c | 111 > +++ > hmp.c| 45 + > 3 files changed, 66 insertions(+), 110 deletions(-) > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index 98a20d2..26f7c0e 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -1312,6 +1312,21 @@ > # the device will be removed from its group and the rest of its > # members will not be affected. The 'group' parameter is ignored. > # > +# See BlockIOThrottle for parameter descriptions. This comment will be trivial as soon as we got used to the 'box' feature :) > +# > +# Returns: Nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# > +# Since: 1.1 > +## > +{ 'command': 'block_set_io_throttle', 'box': true, > + 'data': 'BlockIOThrottle' } > + > +## > +# BlockIOThrottle > +# > +# The parameters for the block_set_io_throttle command. This comment is prone to go stale. > +# > # @device: The name of the device > # > # @bps: total throughput limit in bytes per second > @@ -1378,12 +1393,9 @@ > # > # @group: #optional throttle group name (Since 2.4) > # > -# Returns: Nothing on success > -# If @device is not a valid block device, DeviceNotFound > -# > # Since: 1.1 > ## > -{ 'command': 'block_set_io_throttle', > +{ 'struct': 'BlockIOThrottle', >'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int', > 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int', > '*bps_max': 'int', '*bps_rd_max': 'int', > diff --git a/blockdev.c b/blockdev.c > index cf5afa3..b8db496 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -2625,49 +2625,17 @@ fail: > } > > /* throttling disk I/O limits */ > -void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t > bps_rd, > - int64_t bps_wr, > - int64_t iops, > - int64_t iops_rd, > - int64_t iops_wr, > - bool has_bps_max, > - int64_t bps_max, > - bool has_bps_rd_max, > - int64_t bps_rd_max, > - bool has_bps_wr_max, > - int64_t bps_wr_max, > - bool has_iops_max, > - int64_t iops_max, > - bool has_iops_rd_max, > - int64_t iops_rd_max, > - bool has_iops_wr_max, > - int64_t iops_wr_max, > - bool has_bps_max_length, > - int64_t bps_max_length, > - bool has_bps_rd_max_length, > - int64_t bps_rd_max_length, > - bool has_bps_wr_max_length, > - int64_t bps_wr_max_length, > - bool has_iops_max_length, > - int64_t iops_max_length, > - bool has_iops_rd_max_length, > - int64_t iops_rd_max_length, > - bool has_iops_wr_max_length, > - int64_t iops_wr_max_length, > - bool has_iops_size, > - int64_t iops_size, > - bool has_group, > - const char *group, Error **errp) > +void qmp_block_set_io_throttle(BlockIOThrottle *arg, Error **errp) > { This hunk together with the last one are worth a good chunk of the work you had to do to get here :) > ThrottleConfig cfg; > BlockDriverState *bs; > BlockBackend *blk; > AioContext *aio_context; > > -blk = blk_by_name(device); > +blk = blk_by_name(arg->device); > if (!blk) { > error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, > - "Device '%s' not found", device); > + "Device '%s' not found", arg->device); > return; > } > > @@ -2676,59 +2644,59 @@ void qmp_block_set_io_throttle(const char *device, > int64_t bps, int64_t bps_rd, > > bs = blk_bs(blk); > if (!bs) { > -error_setg(errp, "Device '%s' has no medium", device); > +error_setg(errp, "Device '%s' has no medium", arg->device); > goto out; > } > > throttle_config_init(); > -cfg.buckets[THROTTLE_BPS_TOTAL].avg = bps; > -
Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info
On 14.06.2016 17:47, Daniel P. Berrange wrote: > On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote: >> On 14.06.2016 12:56, Daniel P. Berrange wrote: >>> The qemu-img info command has the ability to expose format >>> specific metadata about volumes. Wire up this facility for >>> the LUKS driver to report on cipher configuration and key >>> slot usage. >>> >>> $ qemu-img info ~/VirtualMachines/demo.luks >>> image: /home/berrange/VirtualMachines/demo.luks >>> file format: luks >>> virtual size: 98M (102760448 bytes) >>> disk size: 100M >>> encrypted: yes >>> Format specific information: >>> ivgen alg: plain64 >>> hash alg: sha1 >>> cipher alg: aes-128 >>> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 >>> cipher mode: xts >>> slots: >>> [0]: >>> active: true >>> iters: 572706 >>> key offset: 4096 >>> stripes: 4000 >>> [1]: >>> active: false >>> key offset: 135168 >>> [2]: >>> active: false >>> key offset: 266240 >>> [3]: >>> active: false >>> key offset: 397312 >>> [4]: >>> active: false >>> key offset: 528384 >>> [5]: >>> active: false >>> key offset: 659456 >>> [6]: >>> active: false >>> key offset: 790528 >>> [7]: >>> active: false >>> key offset: 921600 >>> payload offset: 2097152 >>> master key iters: 142375 >>> >>> One somewhat undesirable artifact is that the data fields are >>> printed out in (apparently) random order. This will be addressed >>> later by changing the way the block layer pretty-prints the >>> image specific data. >>> >>> Signed-off-by: Daniel P. Berrange>>> --- >>> block/crypto.c | 59 >>> >>> qapi/block-core.json | 7 ++- >>> 2 files changed, 65 insertions(+), 1 deletion(-) >>> >>> diff --git a/block/crypto.c b/block/crypto.c >>> index 758e14e..823572f 100644 >>> --- a/block/crypto.c >>> +++ b/block/crypto.c >>> @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char >>> *filename, >>> filename, opts, errp); >>> } >>> >>> +static int block_crypto_get_info_luks(BlockDriverState *bs, >>> + BlockDriverInfo *bdi) >>> +{ >>> +BlockDriverInfo subbdi; >>> +int ret; >>> + >>> +ret = bdrv_get_info(bs->file->bs, ); >>> +if (ret != 0) { >>> +return ret; >>> +} >>> + >>> +bdi->unallocated_blocks_are_zero = false; >>> +bdi->can_write_zeroes_with_unmap = false; >>> +bdi->cluster_size = subbdi.cluster_size; >>> + >>> +return 0; >>> +} >>> + >>> +static ImageInfoSpecific * >>> +block_crypto_get_specific_info_luks(BlockDriverState *bs) >>> +{ >>> +BlockCrypto *crypto = bs->opaque; >>> +ImageInfoSpecific *spec_info; >>> +QCryptoBlockInfo *info; >>> + >>> +info = qcrypto_block_get_info(crypto->block, NULL); >>> +if (!info) { >>> +return NULL; >>> +} >>> +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) { >>> +qapi_free_QCryptoBlockInfo(info); >>> +return NULL; >>> +} >>> + >>> +spec_info = g_new(ImageInfoSpecific, 1); >>> +spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS; >> >> One space too many. >> >>> +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1); >>> +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg; >>> +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode; >>> +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg; >>> +spec_info->u.luks.data->has_ivgen_hash_alg = >>> +info->u.luks.has_ivgen_hash_alg; >>> +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg; >>> +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg; >>> +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset; >>> +spec_info->u.luks.data->master_key_iters = >>> info->u.luks.master_key_iters; >>> +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid); >>> + >>> +/* Steal the pointer instead of bothering to copy */ >>> +spec_info->u.luks.data->slots = info->u.luks.slots; >>> +info->u.luks.slots = NULL; >> >> Why not just steal everything by *spec_info->u.luks.data = info->u.luks >> and then making sure the qapi_free() call below won't free anything in >> there with a memset(>u.luks, 0, sizeof(info->u.luks))? > > I wish we could, but info->u.luks is an embedded QCryptoBlockInfoLUKS, > not merely a pointer to an independently allocated QCryptoBlockInfoLUKS :-( Yes, but note the asterisk I put there. Leave the g_new() and just make it:
[Qemu-block] [PATCH 6/9] block: pass qiov into before_write notifier
Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/io.c| 12 +++- include/block/block_int.h | 1 + 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/block/io.c b/block/io.c index 2d832aa..d2ad09c 100644 --- a/block/io.c +++ b/block/io.c @@ -368,12 +368,14 @@ static void tracked_request_end(BdrvTrackedRequest *req) */ static void tracked_request_begin(BdrvTrackedRequest *req, BlockDriverState *bs, + QEMUIOVector *qiov, int64_t offset, unsigned int bytes, enum BdrvTrackedRequestType type) { *req = (BdrvTrackedRequest){ .bs = bs, +.qiov = qiov, .offset = offset, .bytes = bytes, .type = type, @@ -1073,7 +1075,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs, bytes = ROUND_UP(bytes, align); } -tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_READ); +tracked_request_begin(, bs, NULL, offset, bytes, BDRV_TRACKED_READ); ret = bdrv_aligned_preadv(bs, , offset, bytes, align, use_local_qiov ? _qiov : qiov, flags); @@ -1391,7 +1393,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs, * Pad qiov with the read parts and be sure to have a tracked request not * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle. */ -tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE); +tracked_request_begin(, bs, qiov, offset, bytes, BDRV_TRACKED_WRITE); if (!qiov) { ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, ); @@ -2098,7 +2100,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) return 0; } -tracked_request_begin(, bs, 0, 0, BDRV_TRACKED_FLUSH); +tracked_request_begin(, bs, NULL, 0, 0, BDRV_TRACKED_FLUSH); /* Write back all layers by calling one driver function */ if (bs->drv->bdrv_co_flush) { @@ -2228,7 +2230,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -tracked_request_begin(, bs, sector_num, nb_sectors, +tracked_request_begin(, bs, NULL, sector_num, nb_sectors, BDRV_TRACKED_DISCARD); bdrv_set_dirty(bs, sector_num, nb_sectors); @@ -2331,7 +2333,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf) }; BlockAIOCB *acb; -tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL); +tracked_request_begin(_req, bs, NULL, 0, 0, BDRV_TRACKED_IOCTL); if (!drv || !drv->bdrv_aio_ioctl) { co.ret = -ENOTSUP; goto out; diff --git a/include/block/block_int.h b/include/block/block_int.h index 30a9717..72f463a 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -69,6 +69,7 @@ enum BdrvTrackedRequestType { typedef struct BdrvTrackedRequest { BlockDriverState *bs; +QEMUIOVector *qiov; int64_t offset; unsigned int bytes; enum BdrvTrackedRequestType type; -- 2.5.0
[Qemu-block] [PATCH 8/9] mirror: use synch scheme for drive mirror
Block commit of the active image to the backing store on a slow disk could never end. For example with the guest with the following loop inside while true; do dd bs=1k count=1 if=/dev/zero of=x done running above slow storage could not complete the operation with a resonable amount of time: virsh blockcommit rhel7 sda --active --shallow virsh qemu-monitor-event virsh qemu-monitor-command rhel7 \ '{"execute":"block-job-complete",\ "arguments":{"device":"drive-scsi0-0-0-0"} }' virsh qemu-monitor-event Completion event is never received. This problem could not be fixed easily with the current architecture. We should either prohibit guest writes (making dirty bitmap dirty) or switch to the sycnchronous scheme. This patch implements the latter. It adds mirror_before_write_notify callback. In this case all data written from the guest is synchnonously written to the mirror target. Though the problem is solved partially. We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be done in the next patch. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/block/mirror.c b/block/mirror.c index 7471211..086256c 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -58,6 +58,9 @@ typedef struct MirrorBlockJob { QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; int buf_free_count; +NotifierWithReturn before_write; +CoQueue dependent_writes; + unsigned long *in_flight_bitmap; int in_flight; int sectors_in_flight; @@ -125,6 +128,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret) g_free(op->buf); g_free(op); +qemu_co_queue_restart_all(>dependent_writes); if (s->waiting_for_io) { qemu_coroutine_enter(s->common.co, NULL); } @@ -511,6 +515,74 @@ static void mirror_exit(BlockJob *job, void *opaque) bdrv_unref(src); } +static int coroutine_fn mirror_before_write_notify( +NotifierWithReturn *notifier, void *opaque) +{ +MirrorBlockJob *s = container_of(notifier, MirrorBlockJob, before_write); +BdrvTrackedRequest *req = opaque; +MirrorOp *op; +int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; +int64_t sector_num = req->offset >> BDRV_SECTOR_BITS; +int nb_sectors = req->bytes >> BDRV_SECTOR_BITS; +int64_t end_sector = sector_num + nb_sectors; +int64_t aligned_start, aligned_end; + +if (req->type != BDRV_TRACKED_DISCARD && req->type != BDRV_TRACKED_WRITE) { +/* this is not discard and write, we do not care */ +return 0; +} + +while (1) { +bool waited = false; +int64_t sn; + +for (sn = sector_num; sn < end_sector; sn += sectors_per_chunk) { +int64_t chunk = sn / sectors_per_chunk; +if (test_bit(chunk, s->in_flight_bitmap)) { +trace_mirror_yield_in_flight(s, chunk, s->in_flight); +qemu_co_queue_wait(>dependent_writes); +waited = true; +} +} + +if (!waited) { +break; +} +} + +aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk); +aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, sectors_per_chunk); +if (aligned_end > aligned_start) { +bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start, +aligned_end - aligned_start); +} + +if (req->type == BDRV_TRACKED_DISCARD) { +mirror_do_zero_or_discard(s, sector_num, nb_sectors, true); +return 0; +} + +s->in_flight++; +s->sectors_in_flight += nb_sectors; + +/* Allocate a MirrorOp that is used as an AIO callback. */ +op = g_new(MirrorOp, 1); +op->s = s; +op->sector_num = sector_num; +op->nb_sectors = nb_sectors; +op->buf = qemu_try_blockalign(blk_bs(s->target), req->qiov->size); +if (op->buf == NULL) { +g_free(op); +return -ENOMEM; +} +qemu_iovec_init(>qiov, req->qiov->niov); +qemu_iovec_clone(>qiov, req->qiov, op->buf); + +blk_aio_pwritev(s->target, req->offset, >qiov, 0, +mirror_write_complete, op); +return 0; +} + static int mirror_dirty_init(MirrorBlockJob *s) { int64_t sector_num, end; @@ -764,6 +836,8 @@ immediate_exit: mirror_drain(s); } +notifier_with_return_remove(>before_write); + assert(s->in_flight == 0); qemu_vfree(s->buf); g_free(s->cow_bitmap); @@ -905,6 +979,10 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target, return; } +
Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info
On Tue, Jun 14, 2016 at 05:49:24PM +0200, Max Reitz wrote: > On 14.06.2016 17:47, Daniel P. Berrange wrote: > > On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote: > >> On 14.06.2016 12:56, Daniel P. Berrange wrote: > >>> The qemu-img info command has the ability to expose format > >>> specific metadata about volumes. Wire up this facility for > >>> the LUKS driver to report on cipher configuration and key > >>> slot usage. > >>> > >>> $ qemu-img info ~/VirtualMachines/demo.luks > >>> image: /home/berrange/VirtualMachines/demo.luks > >>> file format: luks > >>> virtual size: 98M (102760448 bytes) > >>> disk size: 100M > >>> encrypted: yes > >>> Format specific information: > >>> ivgen alg: plain64 > >>> hash alg: sha1 > >>> cipher alg: aes-128 > >>> uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > >>> cipher mode: xts > >>> slots: > >>> [0]: > >>> active: true > >>> iters: 572706 > >>> key offset: 4096 > >>> stripes: 4000 > >>> [1]: > >>> active: false > >>> key offset: 135168 > >>> [2]: > >>> active: false > >>> key offset: 266240 > >>> [3]: > >>> active: false > >>> key offset: 397312 > >>> [4]: > >>> active: false > >>> key offset: 528384 > >>> [5]: > >>> active: false > >>> key offset: 659456 > >>> [6]: > >>> active: false > >>> key offset: 790528 > >>> [7]: > >>> active: false > >>> key offset: 921600 > >>> payload offset: 2097152 > >>> master key iters: 142375 > >>> > >>> One somewhat undesirable artifact is that the data fields are > >>> printed out in (apparently) random order. This will be addressed > >>> later by changing the way the block layer pretty-prints the > >>> image specific data. > >>> > >>> Signed-off-by: Daniel P. Berrange> >>> --- > >>> block/crypto.c | 59 > >>> > >>> qapi/block-core.json | 7 ++- > >>> 2 files changed, 65 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/block/crypto.c b/block/crypto.c > >>> index 758e14e..823572f 100644 > >>> --- a/block/crypto.c > >>> +++ b/block/crypto.c > >>> @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char > >>> *filename, > >>> filename, opts, errp); > >>> } > >>> > >>> +static int block_crypto_get_info_luks(BlockDriverState *bs, > >>> + BlockDriverInfo *bdi) > >>> +{ > >>> +BlockDriverInfo subbdi; > >>> +int ret; > >>> + > >>> +ret = bdrv_get_info(bs->file->bs, ); > >>> +if (ret != 0) { > >>> +return ret; > >>> +} > >>> + > >>> +bdi->unallocated_blocks_are_zero = false; > >>> +bdi->can_write_zeroes_with_unmap = false; > >>> +bdi->cluster_size = subbdi.cluster_size; > >>> + > >>> +return 0; > >>> +} > >>> + > >>> +static ImageInfoSpecific * > >>> +block_crypto_get_specific_info_luks(BlockDriverState *bs) > >>> +{ > >>> +BlockCrypto *crypto = bs->opaque; > >>> +ImageInfoSpecific *spec_info; > >>> +QCryptoBlockInfo *info; > >>> + > >>> +info = qcrypto_block_get_info(crypto->block, NULL); > >>> +if (!info) { > >>> +return NULL; > >>> +} > >>> +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) { > >>> +qapi_free_QCryptoBlockInfo(info); > >>> +return NULL; > >>> +} > >>> + > >>> +spec_info = g_new(ImageInfoSpecific, 1); > >>> +spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS; > >> > >> One space too many. > >> > >>> +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1); > >>> +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg; > >>> +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode; > >>> +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg; > >>> +spec_info->u.luks.data->has_ivgen_hash_alg = > >>> +info->u.luks.has_ivgen_hash_alg; > >>> +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg; > >>> +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg; > >>> +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset; > >>> +spec_info->u.luks.data->master_key_iters = > >>> info->u.luks.master_key_iters; > >>> +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid); > >>> + > >>> +/* Steal the pointer instead of bothering to copy */ > >>> +spec_info->u.luks.data->slots = info->u.luks.slots; > >>> +info->u.luks.slots = NULL; > >> > >> Why not just steal everything by *spec_info->u.luks.data = info->u.luks > >> and then making sure the qapi_free() call below
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
On 14/06/2016 17:02, Alex Bligh wrote: > > On 14 Jun 2016, at 14:32, Paolo Bonziniwrote: > >> >> On 13/06/2016 23:41, Alex Bligh wrote: >>> That's one of the reasons that there is a proposal to add >>> STRUCTURED_READ to the spec (although I still haven't had time to >>> implement that for qemu), so that we have a newer approach that allows >>> for proper error handling without ambiguity on whether bogus bytes must >>> be sent on a failed read. But you'd have to convince me that ALL >>> existing NBD server and client implementations expect to handle a read >>> error without read payload, otherwise, I will stick with the notion that >>> the current spec wording is correct, and that read errors CANNOT be >>> gracefully recovered from unless BOTH sides transfer (possibly bogus) >>> bytes along with the error message, and which is why BOTH sides of the >>> protocol are warned that read errors usually result in a disconnection >>> rather than clean continuation, without the addition of STRUCTURED_READ. >> >> I suspect that there are exactly two client implementations, > > My understanding is that there are more than 2 client implementations. > A quick google found at least one BSD client. I bet read error handling > is a mess in all of them. Found it, it is exactly the same as Linux and QEMU: https://github.com/bitrig/bitrig/blob/418985278/sys/dev/nbd.c#L577 >> namely >> Linux and QEMU's, and both do the right thing. > > This depends what you mean by 'right'. Both appear to be non-compliant > with the standard. I mean "what makes sense". > Note the standard is not defined by the client implementation, but > by the protocol document. > > IMHO the 'right thing' is what is in the spec. Servers can't send an > error in any other way if they don't buffer the entire read first, as the > read may error towards the end. > > To illustrate the problem, look consider what qemu itself would do as > a server if it can't buffer the entire read issued to it. Return ENOMEM? > The spec originally was not clear on how errors on reads should be > handled, leading to any read causing a protocol drop. The spec is > now clear. Unfortunately it is not possible to make a back compatible > fix. Hence the real fix here is to implement structured replies, > which is what Eric and I have been working on. I agree that structured replies are better. However, it looks like the de facto status prior to structured replies is that the error is in the spec, and this patch introduces a regression. Paolo
Re: [Qemu-block] [PATCH] m25p80: fix test on blk_pread() return value
On 06/14/2016 10:38 AM, Kevin Wolf wrote: > Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben: #4 0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=, offset=30878208, bytes=512, qiov=0x7fa7f47fee60, flags=0) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 #5 0x7fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, bytes=512, qiov=0x7fa80d5191c0, flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 4278124256), flags@entry=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492 >>> >>> That 'flags' value looks bogus... >>> #6 0x7fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, offset=30878208, bytes=256, qiov=0x7fa80d5191c0, flags=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788 #7 0x7fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977 #8 0x7fa81c6c823a in coroutine_trampoline (i0=, i1=) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78 #9 0x7fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >>> >>> and we don't get anything further in the backtrace beyond coroutines, to >>> see who's sending the bad parameters. I recently debugged a bogus flags >>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are >>> used in blk_aio_prwv(): >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html >>> >>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no >>> longer kept the assert at that point. But maybe the correct fix, and/or >>> the hack for catching the bug prior to coroutines, will help you debug >>> where the bad arguments are coming from. >> >> That does not fix the assert. >> #10 0x7fa80d5189d0 in ?? () #11 0x in ?? () (gdb) up 4 #4 0x7fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=, offset=30878208, bytes=512, qiov=0x7fa7f47fee60, flags=0) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 1243 assert(!qiov || bytes == qiov->size); (gdb) p *qiov $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256} >> >> So, it seems that the issue is coming from the fact that bdrv_co_pwritev() >> does not handle alignments less than BDRV_SECTOR_SIZE : >> >> /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ >> uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); >> >> It calls bdrv_aligned_pwritev() which does the assert : >> >> assert(!qiov || bytes == qiov->size); > > Yes, but between these two places, there is code that should actually > enforce the right alignment: > > if ((offset + bytes) & (align - 1)) { > ... > } > > You can see in your backtrace that bdrv_aligned_pwritev() gets a > different qiov than bdrv_co_pwritev() (which is local_qiov in the latter > function). > > It's just unclear to me why this code extended bytes, but didn't add the > tail_buf iovec to local_qiov. The gdb backtrace is bogus. It does not make sense. May be a gdb issue with multithread on jessie. In the path tracking the tail bytes, we have : if ((offset + bytes) & (align - 1)) { ... tail_bytes = (offset + bytes) & (align - 1); qemu_iovec_add(_qiov, tail_buf + tail_bytes, align - tail_bytes); bytes = ROUND_UP(bytes, align); } This is where the issue is I think. The qiov holds 256 and bytes 512. I have no idea how to fix that though. Thanks, C.
[Qemu-block] [PATCH 4/9] mirror: efficiently zero out target
With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed into the wire. Thus the target could be very efficiently zeroed out. This is should be done with the largest chunk possible. This improves the performance of the live migration of the empty disk by 150 times if NBD supports write_zeroes. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/block/mirror.c b/block/mirror.c index c7b3639..c2f8773 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -21,6 +21,7 @@ #include "qemu/ratelimit.h" #include "qemu/bitmap.h" +#define MIRROR_ZERO_CHUNK (3u << (29 - BDRV_SECTOR_BITS)) /* 1.5 Gb */ #define SLICE_TIME1ULL /* ns */ #define MAX_IN_FLIGHT 16 #define DEFAULT_MIRROR_BUF_SIZE (10 << 20) @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s) end = s->bdev_length / BDRV_SECTOR_SIZE; -if (base == NULL && !bdrv_has_zero_init(target_bs)) { +if (base == NULL && !bdrv_has_zero_init(target_bs) && +target_bs->drv->bdrv_co_write_zeroes == NULL) { bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end); return 0; } @@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s) } sector_num += n; } + +if (base != NULL || bdrv_has_zero_init(target_bs)) { +/* no need to zero out entire disk */ +return 0; +} + +for (sector_num = 0; sector_num < end; ) { +int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num); +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); + +if (now - last_pause_ns > SLICE_TIME) { +last_pause_ns = now; +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0); +} + +if (block_job_is_cancelled(>common)) { +return -EINTR; +} + +if (s->in_flight == MAX_IN_FLIGHT) { +trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1); +mirror_wait_for_io(s); +continue; +} + +mirror_do_zero_or_discard(s, sector_num, nb_sectors, false); +sector_num += nb_sectors; +} return 0; } -- 2.5.0
[Qemu-block] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv
4th argument is flags rather than size. Fortunately flags occupies 5 less significant bits and they are always zero due to alignment. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 80fd3c7..3760e29 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -156,8 +156,7 @@ static void mirror_read_complete(void *opaque, int ret) mirror_iteration_done(op, ret); return; } -blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov, -op->nb_sectors * BDRV_SECTOR_SIZE, +blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, >qiov, 0, mirror_write_complete, op); } @@ -274,8 +273,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num, s->sectors_in_flight += nb_sectors; trace_mirror_one_iteration(s, sector_num, nb_sectors); -blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov, - nb_sectors * BDRV_SECTOR_SIZE, +blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, >qiov, 0, mirror_read_complete, op); return ret; } -- 2.5.0
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
> On 14 Jun 2016, at 16:11, Paolo Bonziniwrote: > >> To illustrate the problem, look consider what qemu itself would do as >> a server if it can't buffer the entire read issued to it. > > Return ENOMEM? Well OK, qemu then 'works' on the basis it breaks another part of the spec, which is coping with long reads. > However, it looks like the > de facto status prior to structured replies is that the error is in the > spec, and this patch introduces a regression. Well, I guess the patch makes it work the same as the reference server implementation and the spec, which I'd consider a fix. My view is that the error is in the kernel client. I think Erik CC'd in nbd-general re the comment that the spec was broken; I don't think it is, and don't propose to change it. Wouter might or might not feel differently. It's been reasonably well known (I wrote about it at least 3 years ago), that the current implementation (reference + kernel) does not cope well with errors on reads, so I'm guessing one is just trading one set of brokenness for another. So I'm pretty relaxed about what goes in qemu. -- Alex Bligh
Re: [Qemu-block] [PATCH v2 1/2] crypto: add support for querying parameters for block encryption
On 14.06.2016 12:56, Daniel P. Berrange wrote: > When creating new block encryption volumes, we accept a list of > parameters to control the formatting process. It is useful to > be able to query what those parameters were for existing block > devices. Add a qcrypto_block_get_info() method which returns a > QCryptoBlockInfo instance to report this data. > > Signed-off-by: Daniel P. Berrange> --- > crypto/block-luks.c| 70 ++ > crypto/block.c | 17 +++ > crypto/blockpriv.h | 4 +++ > include/crypto/block.h | 16 +++ > qapi/crypto.json | 76 > ++ > 5 files changed, 183 insertions(+) > > diff --git a/crypto/block-luks.c b/crypto/block-luks.c > index 63649f1..6a6ef07 100644 > --- a/crypto/block-luks.c > +++ b/crypto/block-luks.c [...] > @@ -1284,6 +1305,54 @@ qcrypto_block_luks_create(QCryptoBlock *block, > } > > > +static int qcrypto_block_luks_get_info(QCryptoBlock *block, > + QCryptoBlockInfo *info, > + Error **errp) > +{ > +QCryptoBlockLUKS *luks = block->opaque; > +QCryptoBlockInfoLUKSSlot *slot; > +QCryptoBlockInfoLUKSSlotList *slots = NULL, *prev = NULL; You could make prev a double pointer, initializing it to >u.luks.slots... [1] > +size_t i; > + > +info->u.luks.cipher_alg = luks->cipher_alg; > +info->u.luks.cipher_mode = luks->cipher_mode; > +info->u.luks.ivgen_alg = luks->ivgen_alg; > +if (info->u.luks.ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) { > +info->u.luks.has_ivgen_hash_alg = true; > +info->u.luks.ivgen_hash_alg = luks->ivgen_hash_alg; > +} > +info->u.luks.hash_alg = luks->hash_alg; > +info->u.luks.payload_offset = block->payload_offset; > +info->u.luks.master_key_iters = luks->header.master_key_iterations; > +info->u.luks.uuid = g_strdup((const char *)luks->header.uuid); Wouldn't g_strndup() with sizeof(luks->header.uuid) be a better choice? > + > +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) { > +slots = g_new0(QCryptoBlockInfoLUKSSlotList, 1); > +if (i == 0) { > +info->u.luks.slots = slots; > +} else { > +prev->next = slots; > +} [1] ...then unconditionally use *prev = slots here... > + > +slots->value = slot = g_new0(QCryptoBlockInfoLUKSSlot, 1); > +slot->active = luks->header.key_slots[i].active == > +QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED; > +slot->key_offset = luks->header.key_slots[i].key_offset > + * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE; > +if (slot->active) { > +slot->has_iters = true; > +slot->iters = luks->header.key_slots[i].iterations; > +slot->has_stripes = true; > +slot->stripes = luks->header.key_slots[i].stripes; > +} > + > +prev = slots; [1] ...and prev = >next here. > +} > + > +return 0; > +} > + > + > static void qcrypto_block_luks_cleanup(QCryptoBlock *block) > { > g_free(block->opaque); [...] > diff --git a/include/crypto/block.h b/include/crypto/block.h > index a21e11f..6256f64 100644 > --- a/include/crypto/block.h > +++ b/include/crypto/block.h > @@ -138,6 +138,22 @@ QCryptoBlock > *qcrypto_block_create(QCryptoBlockCreateOptions *options, > void *opaque, > Error **errp); > > + > +/** > + * qcrypto_block_get_info: > + * block: the block encryption object I think this is missing an @. Max > + * @errp: pointer to a NULL-initialized error object > + * > + * Get information about the configuration options for the > + * block encryption object. This includes details such as > + * the cipher algorithms, modes, and initialization vector > + * generators. > + * > + * Returns: a block encryption info object, or NULL on error > + */ > +QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block, > + Error **errp); > + > /** > * @qcrypto_block_decrypt: > * @block: the block encryption object signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 2/2] block: export LUKS specific data to qemu-img info
On Tue, Jun 14, 2016 at 05:38:36PM +0200, Max Reitz wrote: > On 14.06.2016 12:56, Daniel P. Berrange wrote: > > The qemu-img info command has the ability to expose format > > specific metadata about volumes. Wire up this facility for > > the LUKS driver to report on cipher configuration and key > > slot usage. > > > > $ qemu-img info ~/VirtualMachines/demo.luks > > image: /home/berrange/VirtualMachines/demo.luks > > file format: luks > > virtual size: 98M (102760448 bytes) > > disk size: 100M > > encrypted: yes > > Format specific information: > > ivgen alg: plain64 > > hash alg: sha1 > > cipher alg: aes-128 > > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > > cipher mode: xts > > slots: > > [0]: > > active: true > > iters: 572706 > > key offset: 4096 > > stripes: 4000 > > [1]: > > active: false > > key offset: 135168 > > [2]: > > active: false > > key offset: 266240 > > [3]: > > active: false > > key offset: 397312 > > [4]: > > active: false > > key offset: 528384 > > [5]: > > active: false > > key offset: 659456 > > [6]: > > active: false > > key offset: 790528 > > [7]: > > active: false > > key offset: 921600 > > payload offset: 2097152 > > master key iters: 142375 > > > > One somewhat undesirable artifact is that the data fields are > > printed out in (apparently) random order. This will be addressed > > later by changing the way the block layer pretty-prints the > > image specific data. > > > > Signed-off-by: Daniel P. Berrange> > --- > > block/crypto.c | 59 > > > > qapi/block-core.json | 7 ++- > > 2 files changed, 65 insertions(+), 1 deletion(-) > > > > diff --git a/block/crypto.c b/block/crypto.c > > index 758e14e..823572f 100644 > > --- a/block/crypto.c > > +++ b/block/crypto.c > > @@ -565,6 +565,63 @@ static int block_crypto_create_luks(const char > > *filename, > > filename, opts, errp); > > } > > > > +static int block_crypto_get_info_luks(BlockDriverState *bs, > > + BlockDriverInfo *bdi) > > +{ > > +BlockDriverInfo subbdi; > > +int ret; > > + > > +ret = bdrv_get_info(bs->file->bs, ); > > +if (ret != 0) { > > +return ret; > > +} > > + > > +bdi->unallocated_blocks_are_zero = false; > > +bdi->can_write_zeroes_with_unmap = false; > > +bdi->cluster_size = subbdi.cluster_size; > > + > > +return 0; > > +} > > + > > +static ImageInfoSpecific * > > +block_crypto_get_specific_info_luks(BlockDriverState *bs) > > +{ > > +BlockCrypto *crypto = bs->opaque; > > +ImageInfoSpecific *spec_info; > > +QCryptoBlockInfo *info; > > + > > +info = qcrypto_block_get_info(crypto->block, NULL); > > +if (!info) { > > +return NULL; > > +} > > +if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) { > > +qapi_free_QCryptoBlockInfo(info); > > +return NULL; > > +} > > + > > +spec_info = g_new(ImageInfoSpecific, 1); > > +spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS; > > One space too many. > > > +spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1); > > +spec_info->u.luks.data->cipher_alg = info->u.luks.cipher_alg; > > +spec_info->u.luks.data->cipher_mode = info->u.luks.cipher_mode; > > +spec_info->u.luks.data->ivgen_alg = info->u.luks.ivgen_alg; > > +spec_info->u.luks.data->has_ivgen_hash_alg = > > +info->u.luks.has_ivgen_hash_alg; > > +spec_info->u.luks.data->ivgen_hash_alg = info->u.luks.ivgen_hash_alg; > > +spec_info->u.luks.data->hash_alg = info->u.luks.hash_alg; > > +spec_info->u.luks.data->payload_offset = info->u.luks.payload_offset; > > +spec_info->u.luks.data->master_key_iters = > > info->u.luks.master_key_iters; > > +spec_info->u.luks.data->uuid = g_strdup(info->u.luks.uuid); > > + > > +/* Steal the pointer instead of bothering to copy */ > > +spec_info->u.luks.data->slots = info->u.luks.slots; > > +info->u.luks.slots = NULL; > > Why not just steal everything by *spec_info->u.luks.data = info->u.luks > and then making sure the qapi_free() call below won't free anything in > there with a memset(>u.luks, 0, sizeof(info->u.luks))? I wish we could, but info->u.luks is an embedded QCryptoBlockInfoLUKS, not merely a pointer to an independently allocated QCryptoBlockInfoLUKS :-( Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o-
[Qemu-block] [PATCH 0/9] major rework of drive-mirror
Block commit of the active image to the backing store on a slow disk could never end. For example with the guest with the following loop inside while true; do dd bs=1k count=1 if=/dev/zero of=x done running above slow storage could not complete the operation with a resonable amount of time: virsh blockcommit rhel7 sda --active --shallow virsh qemu-monitor-event virsh qemu-monitor-command rhel7 \ '{"execute":"block-job-complete",\ "arguments":{"device":"drive-scsi0-0-0-0"} }' virsh qemu-monitor-event Completion event is never received. This problem could not be fixed easily with the current architecture. We should either prohibit guest writes (making dirty bitmap dirty) or switch to the sycnchronous scheme. This series switches driver mirror to synch scheme. Actually we can have something more intelligent and switch to sync mirroring just after the first pass over the bitmap. Though this could be done relatively easily during discussion. The most difficult things are here. The set also adds some performance improvements dealing with known-to-be-zero areas. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake Denis V. Lunev (9): mirror: fix calling of blk_aio_pwritev/blk_aio_preadv mirror: create mirror_dirty_init helper for mirror_run mirror: optimize dirty bitmap filling in mirror_run a bit mirror: efficiently zero out target mirror: improve performance of mirroring of empty disk block: pass qiov into before_write notifier mirror: allow to save buffer for QEMUIOVector in MirrorOp mirror: use synch scheme for drive mirror mirror: replace bdrv_dirty_bitmap with plain hbitmap block/io.c| 12 +- block/mirror.c| 290 +++--- include/block/block_int.h | 1 + 3 files changed, 229 insertions(+), 74 deletions(-) -- 2.5.0
[Qemu-block] [PATCH 9/9] mirror: replace bdrv_dirty_bitmap with plain hbitmap
We have replaced the mechanics of syncing new writes in the previous patch and thus do not need to track dirty changes anymore. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 58 ++ 1 file changed, 38 insertions(+), 20 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 086256c..926fd13 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -52,7 +52,7 @@ typedef struct MirrorBlockJob { size_t buf_size; int64_t bdev_length; unsigned long *cow_bitmap; -BdrvDirtyBitmap *dirty_bitmap; +HBitmap *copy_bitmap; HBitmapIter hbi; uint8_t *buf; QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; @@ -141,7 +141,7 @@ static void mirror_write_complete(void *opaque, int ret) if (ret < 0) { BlockErrorAction action; -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors); +hbitmap_set(s->copy_bitmap, op->sector_num, op->nb_sectors); action = mirror_error_action(s, false, -ret); if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) { s->ret = ret; @@ -157,7 +157,7 @@ static void mirror_read_complete(void *opaque, int ret) if (ret < 0) { BlockErrorAction action; -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors); +hbitmap_set(s->copy_bitmap, op->sector_num, op->nb_sectors); action = mirror_error_action(s, true, -ret); if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) { s->ret = ret; @@ -328,9 +328,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) sector_num = hbitmap_iter_next(>hbi); if (sector_num < 0) { -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi); +hbitmap_iter_init(>hbi, s->copy_bitmap, 0); sector_num = hbitmap_iter_next(>hbi); -trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); +trace_mirror_restart_iter(s, hbitmap_count(s->copy_bitmap)); assert(sector_num >= 0); } @@ -347,7 +347,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk; int64_t next_chunk = next_sector / sectors_per_chunk; if (next_sector >= end || -!bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) { +!hbitmap_get(s->copy_bitmap, next_sector)) { break; } if (test_bit(next_chunk, s->in_flight_bitmap)) { @@ -357,7 +357,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) hbitmap_next = hbitmap_iter_next(>hbi); if (hbitmap_next > next_sector || hbitmap_next < 0) { /* The bitmap iterator's cache is stale, refresh it */ -bdrv_set_dirty_iter(>hbi, next_sector); +hbitmap_iter_init(>hbi, s->copy_bitmap, next_sector); hbitmap_next = hbitmap_iter_next(>hbi); } assert(hbitmap_next == next_sector); @@ -368,8 +368,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) * calling bdrv_get_block_status_above could yield - if some blocks are * marked dirty in this window, we need to know. */ -bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, -nb_chunks * sectors_per_chunk); +hbitmap_reset(s->copy_bitmap, sector_num, nb_chunks * sectors_per_chunk); bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks); while (nb_chunks > 0 && sector_num < end) { int ret; @@ -553,8 +552,8 @@ static int coroutine_fn mirror_before_write_notify( aligned_start = QEMU_ALIGN_UP(sector_num, sectors_per_chunk); aligned_end = QEMU_ALIGN_DOWN(sector_num + nb_sectors, sectors_per_chunk); if (aligned_end > aligned_start) { -bdrv_reset_dirty_bitmap(s->dirty_bitmap, aligned_start, -aligned_end - aligned_start); +hbitmap_reset(s->copy_bitmap, aligned_start, + aligned_end - aligned_start); } if (req->type == BDRV_TRACKED_DISCARD) { @@ -596,7 +595,7 @@ static int mirror_dirty_init(MirrorBlockJob *s) if (base == NULL && !bdrv_has_zero_init(target_bs) && target_bs->drv->bdrv_co_write_zeroes == NULL) { -bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end); +hbitmap_set(s->copy_bitmap, 0, end); return 0; } @@ -625,7 +624,7 @@ static int mirror_dirty_init(MirrorBlockJob *s) assert(n > 0); if (ret == 1) { -bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n); +hbitmap_set(s->copy_bitmap,
[Qemu-block] [PATCH 5/9] mirror: improve performance of mirroring of empty disk
We should not take into account zero blocks for delay calculations. They are not read and thus IO throttling is not required. In the other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes days. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index c2f8773..d8be80a 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -363,7 +363,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks); while (nb_chunks > 0 && sector_num < end) { int ret; -int io_sectors; +int io_sectors, io_sectors_acct; BlockDriverState *file; enum MirrorMethod { MIRROR_METHOD_COPY, @@ -399,12 +399,15 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) switch (mirror_method) { case MIRROR_METHOD_COPY: io_sectors = mirror_do_read(s, sector_num, io_sectors); +io_sectors_acct = io_sectors; break; case MIRROR_METHOD_ZERO: mirror_do_zero_or_discard(s, sector_num, io_sectors, false); +io_sectors_acct = 0; break; case MIRROR_METHOD_DISCARD: mirror_do_zero_or_discard(s, sector_num, io_sectors, true); +io_sectors_acct = 0; break; default: abort(); @@ -412,7 +415,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) assert(io_sectors); sector_num += io_sectors; nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk); -delay_ns += ratelimit_calculate_delay(>limit, io_sectors); +delay_ns += ratelimit_calculate_delay(>limit, io_sectors_acct); } return delay_ns; } -- 2.5.0
Re: [Qemu-block] [Qemu-devel] [PATCH v7 09/15] block: Simplify drive-mirror
Eric Blakewrites: > Now that we can support boxed commands, use it to greatly > reduce the number of parameters (and likelihood of getting > out of sync) when adjusting drive-mirror parameters. > > Signed-off-by: Eric Blake > > --- > v7: new patch > --- > qapi/block-core.json | 17 - > blockdev.c | 72 > ++-- > hmp.c| 27 +--- > 3 files changed, 59 insertions(+), 57 deletions(-) > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index 26f7c0e..885a75a 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -1108,6 +1108,21 @@ > # > # Start mirroring a block device's writes to a new destination. > # > +# See DriveMirror for parameter descriptions > +# > +# Returns: nothing on success > +# If @device is not a valid block device, DeviceNotFound > +# > +# Since 1.3 > +## > +{ 'command': 'drive-mirror', 'box': true, > + 'data': 'DriveMirror' } > + > +## > +# DriveMirror > +# > +# The parameters for the drive-mirror command. > +# > # @device: the name of the device whose writes should be mirrored. > # > # @target: the target of the new image. If the file exists, or if it > @@ -1159,7 +1174,7 @@ # # Returns: nothing on success # If @device is not a valid block device, DeviceNotFound You forgot to delete this part. # > # > # Since 1.3 Kind of (same in previous patch). > ## > -{ 'command': 'drive-mirror', > +{ 'struct': 'DriveMirror', >'data': { 'device': 'str', 'target': 'str', '*format': 'str', > '*node-name': 'str', '*replaces': 'str', > 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode', > diff --git a/blockdev.c b/blockdev.c > index b8db496..94850fd 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -3457,19 +3457,7 @@ static void blockdev_mirror_common(BlockDriverState > *bs, > block_job_cb, bs, errp); > } > > -void qmp_drive_mirror(const char *device, const char *target, > - bool has_format, const char *format, > - bool has_node_name, const char *node_name, > - bool has_replaces, const char *replaces, > - enum MirrorSyncMode sync, > - bool has_mode, enum NewImageMode mode, > - bool has_speed, int64_t speed, > - bool has_granularity, uint32_t granularity, > - bool has_buf_size, int64_t buf_size, > - bool has_on_source_error, BlockdevOnError > on_source_error, > - bool has_on_target_error, BlockdevOnError > on_target_error, > - bool has_unmap, bool unmap, > - Error **errp) > +void qmp_drive_mirror(DriveMirror *arg, Error **errp) > { > BlockDriverState *bs; > BlockBackend *blk; > @@ -3480,11 +3468,12 @@ void qmp_drive_mirror(const char *device, const char > *target, > int flags; > int64_t size; > int ret; > +const char *format = arg->format; > > -blk = blk_by_name(device); > +blk = blk_by_name(arg->device); > if (!blk) { > error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, > - "Device '%s' not found", device); > + "Device '%s' not found", arg->device); > return; > } > > @@ -3492,24 +3481,25 @@ void qmp_drive_mirror(const char *device, const char > *target, > aio_context_acquire(aio_context); > > if (!blk_is_available(blk)) { > -error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); > +error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, arg->device); > goto out; > } > bs = blk_bs(blk); > -if (!has_mode) { > -mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; > +if (!arg->has_mode) { > +arg->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; > } > > -if (!has_format) { > -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : > bs->drv->format_name; > +if (!arg->has_format) { > +format = (arg->mode == NEW_IMAGE_MODE_EXISTING > + ? NULL : bs->drv->format_name); > } > > flags = bs->open_flags | BDRV_O_RDWR; > source = backing_bs(bs); > -if (!source && sync == MIRROR_SYNC_MODE_TOP) { > -sync = MIRROR_SYNC_MODE_FULL; > +if (!source && arg->sync == MIRROR_SYNC_MODE_TOP) { > +arg->sync = MIRROR_SYNC_MODE_FULL; > } > -if (sync == MIRROR_SYNC_MODE_NONE) { > +if (arg->sync == MIRROR_SYNC_MODE_NONE) { > source = bs; > } > > @@ -3519,18 +3509,18 @@ void qmp_drive_mirror(const char *device, const char > *target, > goto out; > } > > -if (has_replaces) { > +if (arg->has_replaces) { > BlockDriverState *to_replace_bs; > AioContext *replace_aio_context; > int64_t replace_size; > > -if (!has_node_name) { > +if
[Qemu-block] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit
There is no need to scan allocation tables if we have mark_all_dirty flag set. Just mark it all dirty. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 797659d..c7b3639 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s) BlockDriverState *base = s->base; BlockDriverState *bs = blk_bs(s->common.blk); BlockDriverState *target_bs = blk_bs(s->target); -bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs); uint64_t last_pause_ns; int ret, n; end = s->bdev_length / BDRV_SECTOR_SIZE; +if (base == NULL && !bdrv_has_zero_init(target_bs)) { +bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end); +return 0; +} + last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); /* First part, loop on the sectors and initialize the dirty bitmap. */ @@ -537,7 +541,7 @@ static int mirror_dirty_init(MirrorBlockJob *s) } assert(n > 0); -if (ret == 1 || mark_all_dirty) { +if (ret == 1) { bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n); } sector_num += n; -- 2.5.0
Re: [Qemu-block] [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit
Am 14.06.2016 um 16:47 hat Eric Blake geschrieben: > On 06/14/2016 02:05 AM, Kevin Wolf wrote: > > static void raw_refresh_limits(BlockDriverState *bs, Error **errp) > { > +/* Inherit all limits except for request_alignment */ > +int request_alignment = bs->bl.request_alignment; > + > bs->bl = bs->file->bs->bl; > +bs->bl.request_alignment = request_alignment; > >> > >> Any ideas on how to fix the test, then? Have two blkdebug devices > >> nested atop one another, since those are the devices where we can > >> explicitly override alignment? > > > > Interesting idea. Maybe that's a good option if it works. > > > >> (normally, you'd _expect_ the chain to > >> inherit the worst-case alignment of all BDS in the chain, so blkdebug is > >> the way around it). > > > > Actually, I think there are two cases. > > > > The first one is that you get a request and want to know what to do with > > it. Here you don't want to inherit the worst-case alignment. You'd > > rather want to enforce alignment only when it is actually needed. For > > example, if you have a cache=none backing file with 4k alignment, but a > > cache=writeback overlay, and you don't actually access the backing file > > with this request, it would be wasteful to align the request. You also > > don't really know that a driver will issue a misaligned request (or any > > request at all) to the lower layer just because it got called with one. > > > > The second case is when you want to issue a request. For example, let's > > take the qcow2 cache here, which has 64k cached and needs to update a > > few bytes. Currently it always writes the full 64k (and with 1 MB > > clusters a full megabyte), but what it really should do is consider the > > worst-case alignment. > > > > This is comparable to the difference between bdrv_opt_mem_align(), which > > is used in qemu_blockalign() where we want to create a buffer that works > > even in the worst case, and bdrv_min_mem_align(), which is used in > > bdrv_qiov_is_aligned() in order to determine whether we must align an > > existing request. > > > >> That's the only thing left before I repost the series, so I may just > >> post the last patch as RFC and play with it a bit more... > > > > And in the light of the above, maybe the solution is as easy as changing > > raw_refresh_limits() to set bs->bl.request_alignment = 1. > > Or maybe split the main bdrv_refresh_limits() into two pieces: one that > merges limits from one BDS into another (the limits that are worth > merging, and in the correct direction: opt merges to MAX, max merges to > MIN_NON_ZERO, request_alignment is not merged), the other that calls > merge(bs, bs->file->bs); then have raw_refresh_limits() also use the > common merge functionality rather than straight copy. Testing that > approach now... So you don't agree with what I said above? I think that block drivers should only set limits that they require for themselves. The block layer bdrv_refresh_limits() function can then merge things where needed; drivers shouldn't be involved there. Kevin pgpceb24nIaLM.pgp Description: PGP signature
[Qemu-block] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp
Properly cook MirrorOp initialization/deinitialization. The field is not yet used actually. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index d8be80a..7471211 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -73,6 +73,7 @@ typedef struct MirrorOp { QEMUIOVector qiov; int64_t sector_num; int nb_sectors; +void *buf; } MirrorOp; static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read, @@ -100,24 +101,28 @@ static void mirror_iteration_done(MirrorOp *op, int ret) s->in_flight--; s->sectors_in_flight -= op->nb_sectors; iov = op->qiov.iov; -for (i = 0; i < op->qiov.niov; i++) { -MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base; -QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next); -s->buf_free_count++; -} -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; -chunk_num = op->sector_num / sectors_per_chunk; -nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk); -bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks); -if (ret >= 0) { -if (s->cow_bitmap) { -bitmap_set(s->cow_bitmap, chunk_num, nb_chunks); +if (op->buf == NULL) { +for (i = 0; i < op->qiov.niov; i++) { +MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base; +QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next); +s->buf_free_count++; +} + +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; +chunk_num = op->sector_num / sectors_per_chunk; +nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk); +bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks); +if (ret >= 0) { +if (s->cow_bitmap) { +bitmap_set(s->cow_bitmap, chunk_num, nb_chunks); +} +s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE; } -s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE; } qemu_iovec_destroy(>qiov); +g_free(op->buf); g_free(op); if (s->waiting_for_io) { @@ -255,6 +260,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num, op->s = s; op->sector_num = sector_num; op->nb_sectors = nb_sectors; +op->buf = NULL; /* Now make a QEMUIOVector taking enough granularity-sized chunks * from s->buf_free. -- 2.5.0
[Qemu-block] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run
The code inside the helper will be extended in the next patch. mirror_run itself is overbloated at the moment. Signed-off-by: Denis V. LunevReviewed-by: Vladimir Sementsov-Ogievskiy CC: Stefan Hajnoczi CC: Fam Zheng CC: Kevin Wolf CC: Max Reitz CC: Jeff Cody CC: Eric Blake --- block/mirror.c | 83 ++ 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 3760e29..797659d 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -501,19 +501,62 @@ static void mirror_exit(BlockJob *job, void *opaque) bdrv_unref(src); } +static int mirror_dirty_init(MirrorBlockJob *s) +{ +int64_t sector_num, end; +BlockDriverState *base = s->base; +BlockDriverState *bs = blk_bs(s->common.blk); +BlockDriverState *target_bs = blk_bs(s->target); +bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs); +uint64_t last_pause_ns; +int ret, n; + +end = s->bdev_length / BDRV_SECTOR_SIZE; + +last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); + +/* First part, loop on the sectors and initialize the dirty bitmap. */ +for (sector_num = 0; sector_num < end; ) { +/* Just to make sure we are not exceeding int limit. */ +int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS, + end - sector_num); +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); + +if (now - last_pause_ns > SLICE_TIME) { +last_pause_ns = now; +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0); +} + +if (block_job_is_cancelled(>common)) { +return 0; +} + +ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, ); +if (ret < 0) { +return ret; +} + +assert(n > 0); +if (ret == 1 || mark_all_dirty) { +bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n); +} +sector_num += n; +} +return 0; +} + static void coroutine_fn mirror_run(void *opaque) { MirrorBlockJob *s = opaque; MirrorExitData *data; BlockDriverState *bs = blk_bs(s->common.blk); BlockDriverState *target_bs = blk_bs(s->target); -int64_t sector_num, end, length; +int64_t length; uint64_t last_pause_ns; BlockDriverInfo bdi; char backing_filename[2]; /* we only need 2 characters because we are only checking for a NULL string */ int ret = 0; -int n; int target_cluster_size = BDRV_SECTOR_SIZE; if (block_job_is_cancelled(>common)) { @@ -555,7 +598,6 @@ static void coroutine_fn mirror_run(void *opaque) s->target_cluster_sectors = target_cluster_size >> BDRV_SECTOR_BITS; s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov); -end = s->bdev_length / BDRV_SECTOR_SIZE; s->buf = qemu_try_blockalign(bs, s->buf_size); if (s->buf == NULL) { ret = -ENOMEM; @@ -564,41 +606,14 @@ static void coroutine_fn mirror_run(void *opaque) mirror_free_init(s); -last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); if (!s->is_none_mode) { -/* First part, loop on the sectors and initialize the dirty bitmap. */ -BlockDriverState *base = s->base; -bool mark_all_dirty = s->base == NULL && !bdrv_has_zero_init(target_bs); - -for (sector_num = 0; sector_num < end; ) { -/* Just to make sure we are not exceeding int limit. */ -int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS, - end - sector_num); -int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); - -if (now - last_pause_ns > SLICE_TIME) { -last_pause_ns = now; -block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0); -} - -if (block_job_is_cancelled(>common)) { -goto immediate_exit; -} - -ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, ); - -if (ret < 0) { -goto immediate_exit; -} - -assert(n > 0); -if (ret == 1 || mark_all_dirty) { -bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n); -} -sector_num += n; +ret = mirror_dirty_init(s); +if (ret < 0 || block_job_is_cancelled(>common)) { +goto immediate_exit; } } +last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); bdrv_dirty_iter_init(s->dirty_bitmap, >hbi); for (;;) { uint64_t delay_ns = 0; -- 2.5.0
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
On 14 Jun 2016, at 14:32, Paolo Bonziniwrote: > > On 13/06/2016 23:41, Alex Bligh wrote: >> That's one of the reasons that there is a proposal to add >> STRUCTURED_READ to the spec (although I still haven't had time to >> implement that for qemu), so that we have a newer approach that allows >> for proper error handling without ambiguity on whether bogus bytes must >> be sent on a failed read. But you'd have to convince me that ALL >> existing NBD server and client implementations expect to handle a read >> error without read payload, otherwise, I will stick with the notion that >> the current spec wording is correct, and that read errors CANNOT be >> gracefully recovered from unless BOTH sides transfer (possibly bogus) >> bytes along with the error message, and which is why BOTH sides of the >> protocol are warned that read errors usually result in a disconnection >> rather than clean continuation, without the addition of STRUCTURED_READ. > > I suspect that there are exactly two client implementations, My understanding is that there are more than 2 client implementations. A quick google found at least one BSD client. I bet read error handling is a mess in all of them. > namely > Linux and QEMU's, and both do the right thing. This depends what you mean by 'right'. Both appear to be non-compliant with the standard. Note the standard is not defined by the client implementation, but by the protocol document. IMHO the 'right thing' is what is in the spec. Servers can't send an error in any other way if they don't buffer the entire read first, as the read may error towards the end. To illustrate the problem, look consider what qemu itself would do as a server if it can't buffer the entire read issued to it. > What servers do doesn't matter, if all the clients agree. The spec originally was not clear on how errors on reads should be handled, leading to any read causing a protocol drop. The spec is now clear. Unfortunately it is not possible to make a back compatible fix. Hence the real fix here is to implement structured replies, which is what Eric and I have been working on. -- Alex Bligh
Re: [Qemu-block] [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit
On 06/14/2016 02:05 AM, Kevin Wolf wrote: static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { +/* Inherit all limits except for request_alignment */ +int request_alignment = bs->bl.request_alignment; + bs->bl = bs->file->bs->bl; +bs->bl.request_alignment = request_alignment; >> >> Any ideas on how to fix the test, then? Have two blkdebug devices >> nested atop one another, since those are the devices where we can >> explicitly override alignment? > > Interesting idea. Maybe that's a good option if it works. > >> (normally, you'd _expect_ the chain to >> inherit the worst-case alignment of all BDS in the chain, so blkdebug is >> the way around it). > > Actually, I think there are two cases. > > The first one is that you get a request and want to know what to do with > it. Here you don't want to inherit the worst-case alignment. You'd > rather want to enforce alignment only when it is actually needed. For > example, if you have a cache=none backing file with 4k alignment, but a > cache=writeback overlay, and you don't actually access the backing file > with this request, it would be wasteful to align the request. You also > don't really know that a driver will issue a misaligned request (or any > request at all) to the lower layer just because it got called with one. > > The second case is when you want to issue a request. For example, let's > take the qcow2 cache here, which has 64k cached and needs to update a > few bytes. Currently it always writes the full 64k (and with 1 MB > clusters a full megabyte), but what it really should do is consider the > worst-case alignment. > > This is comparable to the difference between bdrv_opt_mem_align(), which > is used in qemu_blockalign() where we want to create a buffer that works > even in the worst case, and bdrv_min_mem_align(), which is used in > bdrv_qiov_is_aligned() in order to determine whether we must align an > existing request. > >> That's the only thing left before I repost the series, so I may just >> post the last patch as RFC and play with it a bit more... > > And in the light of the above, maybe the solution is as easy as changing > raw_refresh_limits() to set bs->bl.request_alignment = 1. Or maybe split the main bdrv_refresh_limits() into two pieces: one that merges limits from one BDS into another (the limits that are worth merging, and in the correct direction: opt merges to MAX, max merges to MIN_NON_ZERO, request_alignment is not merged), the other that calls merge(bs, bs->file->bs); then have raw_refresh_limits() also use the common merge functionality rather than straight copy. Testing that approach now... -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH] backup: Fail early if cannot determine cluster size
On 18.05.2016 07:53, Fam Zheng wrote: > Otherwise the job is orphaned and block_job_cancel_sync in > bdrv_close_all() when quiting will hang. > > A simple reproducer is running blockdev-backup from null-co:// to > null-co://. > > Cc: qemu-sta...@nongnu.org > Signed-off-by: Fam Zheng> --- > block/backup.c | 34 ++ > 1 file changed, 18 insertions(+), 16 deletions(-) Sorry for having waited so long (I was thinking that maybe Jeff wanted to take this patch), but now the patch no longer applies and I don't feel comfortable with just fixing it up myself; git-backport-diff tells me the required changes are "[0015] [FC]". Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH v5 03/11] qom: support arbitrary non-scalar properties with -object
On Thu, Jun 09, 2016 at 04:43:35PM +0200, Markus Armbruster wrote: > "Daniel P. Berrange"writes: > > > The current -object command line syntax only allows for > > creation of objects with scalar properties, or a list > > with a fixed scalar element type. Objects which have > > properties that are represented as structs in the QAPI > > schema cannot be created using -object. > > > > This is a design limitation of the way the OptsVisitor > > is written. It simply iterates over the QemuOpts values > > as a flat list. The support for lists is enabled by > > allowing the same key to be repeated in the opts string. > > > > It is not practical to extend the OptsVisitor to support > > more complex data structures while also maintaining > > the existing list handling behaviour that is relied upon > > by other areas of QEMU. > > It should be practical to create a new option input visitor that parses > command line syntax straight into a QAPI visit, without the list magic, > and with fewer or no restrictions. Then we could start defining complex > command line arguments as QAPI types, parse them straight into generated > QAPI types, and dispense with the QDict wrangling. Funnily enough I did actually start out trying to implement pretty much exactly that. I found that in my new opts visitor, I was essentially parsing the QemuOpts into QDict/QLists to handle the recursion from the visitor. At this point I was starting to duplicate much code from QDict and QmpInputVisitor, so it abandoned that approach and went for the more general QemuOpts -> QDict crumping instead, since it the goal to be achieved with simpler code overall. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-block] [PATCH v2] rbd:change error_setg() to error_setg_errno()
On 09.05.2016 09:51, Vikhyat Umrao wrote: > Ceph RBD block driver does not use error_setg_errno() where > it is possible to use. This patch replaces error_setg() > from error_setg_errno(). > > Signed-off-by: Vikhyat Umrao> --- > block/rbd.c | 38 +++--- > 1 file changed, 23 insertions(+), 15 deletions(-) Thanks, applied to my block tree: https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev
On 06/14/2016 07:32 AM, Kevin Wolf wrote: > The raw-posix block driver actually supports byte-aligned requests now > on non-O_DIRECT images, like it already (and previously incorrectly) > claimed in bs->request_alignment. > > For some block drivers this means that a RMW cycle can be avoided when > they write sub-sector metadata e.g. for cluster allocation. > > Signed-off-by: Kevin Wolf> --- > block/linux-aio.c | 7 ++- > block/raw-aio.h | 3 +-- > block/raw-posix.c | 43 +++ > 3 files changed, 26 insertions(+), 27 deletions(-) > > -static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t > sector_num, > - int nb_sectors, QEMUIOVector *qiov) > +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset, > + uint64_t bytes, QEMUIOVector *qiov, > + int flags) > { > -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_READ); > +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ); > } > > -static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t > sector_num, > - int nb_sectors, QEMUIOVector *qiov) > +static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset, > + uint64_t bytes, QEMUIOVector *qiov, > + int flags) > { > -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_WRITE); > +assert(flags == 0); > +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE); > } Looks odd to assert !flags on pwritev but not on preadv. Minor enough that you could fix on pull request. Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v2 4/6] raw-posix: Switch to bdrv_co_* interfaces
On 06/14/2016 07:32 AM, Kevin Wolf wrote: > In order to use the modern byte-based .bdrv_co_preadv/pwritev() > interface, this patch switches raw-posix to coroutine-based interfaces > as a first step. In terms of semantics and performance, it doesn't make > a difference with the existing code whether we go from a coroutine to a > callback-based interface already in block/io.c or only in linux-aio.c > > As there have been concerns in the past that this change may be a step > in the wrong direction with respect to a possible AIO fast path, the > old callback-based interface for linux-aio is left around and can be > reactivated when a fast path (e.g. directly from virtio-blk dataplane, > bypassing the whole block layer) is implemented. > > Signed-off-by: Kevin Wolf> --- > block/linux-aio.c | 87 > +-- > block/raw-aio.h | 4 +++ > block/raw-posix.c | 59 + > 3 files changed, 96 insertions(+), 54 deletions(-) Reviewed-by: Eric Blake > @@ -1957,8 +1952,8 @@ BlockDriver bdrv_file = { > .bdrv_co_get_block_status = raw_co_get_block_status, > .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes, > > -.bdrv_aio_readv = raw_aio_readv, > -.bdrv_aio_writev = raw_aio_writev, > +.bdrv_co_readv = raw_co_readv, > +.bdrv_co_writev = raw_co_writev, > .bdrv_aio_flush = raw_aio_flush, > .bdrv_aio_discard = raw_aio_discard, The rest of this chunk doesn't try to align '='. Not worth a respin, but something you may want to clean up on pull request. > .bdrv_refresh_limits = raw_refresh_limits, > @@ -2405,8 +2400,8 @@ static BlockDriver bdrv_host_device = { > .create_opts = _create_opts, > .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes, > > -.bdrv_aio_readv = raw_aio_readv, > -.bdrv_aio_writev = raw_aio_writev, > +.bdrv_co_readv = raw_co_readv, > +.bdrv_co_writev = raw_co_writev, > .bdrv_aio_flush = raw_aio_flush, > .bdrv_aio_discard = hdev_aio_discard, > .bdrv_refresh_limits = raw_refresh_limits, and this hunk is also inconsistent, but in a different way. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v1 0/6] Report format specific info for LUKS block driver
On Tue, Jun 14, 2016 at 03:56:30PM +0200, Max Reitz wrote: > On 07.06.2016 12:11, Daniel P. Berrange wrote: > > The 'qemu-img info' tool has ability to print format specific > > information, eg with qcow2 it reports two extra items: > > > > $ qemu-img info ~/VirtualMachines/demo.qcow2 > > image: /home/berrange/VirtualMachines/demo.qcow2 > > file format: qcow2 > > virtual size: 3.0G (3221225472 bytes) > > disk size: 140K > > cluster_size: 65536 > > Format specific information: > > compat: 0.10 > > refcount bits: 16 > > > > > > This is not currently wired up for the LUKS driver. This patch > > series adds that support so that we can report useful data about > > the LUKS volume such as the crypto algorithm choices, key slot > > usage and other volume metadata. > > > > The first patch extends the crypto API to allow querying of the > > format specific metadata > > > > The second patches extends the block API to allow the LUKS driver > > to report the format specific metadata. > > > > $ qemu-img info ~/VirtualMachines/demo.luks > > image: /home/berrange/VirtualMachines/demo.luks > > file format: luks > > virtual size: 98M (102760448 bytes) > > disk size: 100M > > encrypted: yes > > Format specific information: > > cipher-alg: aes-128 > > cipher-mode: xts > > ivgen-alg: plain64 > > hash-alg: sha1 > > payload-offset: 2097152 > > master-key-iters: 142375 > > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > > slots: > > [0]: > > active: true > > iters: 572706 > > stripes: 4000 > > key-offset: 8 > > [1]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 264 > > [2]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 520 > > [3]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 776 > > [4]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 1032 > > [5]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 1288 > > [6]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 1544 > > [7]: > > active: false > > iters: 0 > > stripes: 4000 > > key-offset: 1800 > > > > The remaining 4 patches are improvements to QAPI and the core > > block layer to fix a problem whereby struct fields are output > > in (apparently) random ordering. This is because the QAPI type > > is converted into a QObject for pretty-printing, thus throwing > > away any struct field ordering information. > > > > To address this I created a new TextOutputVisitor which can > > directly pretty-print QAPI types. I then changed the code > > generator to create qapi_stringify_TYPENAME() methods for > > all QAPI types. Finally I changed the block layer over to > > use this stringify approach instead. > > General nagging: This new approach no longer replaces dashes with spaces > in the key names. Is it worth doing something about this? Opps, didn't notice that. Should be pretty easy to deal with that in the visitor. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-block] [PATCH v2 3/6] block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
Signed-off-by: Kevin WolfReviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- block/io.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/block/io.c b/block/io.c index e75bce2..b261cc6 100644 --- a/block/io.c +++ b/block/io.c @@ -1249,11 +1249,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, bool waited; int ret; -int64_t sector_num = offset >> BDRV_SECTOR_BITS; -unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS; +int64_t start_sector = offset >> BDRV_SECTOR_BITS; +int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); -assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0); -assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0); assert(!qiov || bytes == qiov->size); assert((bs->open_flags & BDRV_O_NO_IO) == 0); assert(!(flags & ~BDRV_REQ_MASK)); @@ -1278,22 +1276,21 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, /* Do nothing, write notifier decided to fail this request */ } else if (flags & BDRV_REQ_ZERO_WRITE) { bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO); -ret = bdrv_co_do_pwrite_zeroes(bs, sector_num << BDRV_SECTOR_BITS, - nb_sectors << BDRV_SECTOR_BITS, flags); +ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags); } else { bdrv_debug_event(bs, BLKDBG_PWRITEV); ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags); } bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE); -bdrv_set_dirty(bs, sector_num, nb_sectors); +bdrv_set_dirty(bs, start_sector, end_sector - start_sector); if (bs->wr_highest_offset < offset + bytes) { bs->wr_highest_offset = offset + bytes; } if (ret >= 0) { -bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors); +bs->total_sectors = MAX(bs->total_sectors, end_sector); } return ret; -- 1.8.3.1
[Qemu-block] [PATCH v2 1/6] block: Byte-based bdrv_co_do_copy_on_readv()
In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin WolfReviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- block/io.c| 63 +++ block/mirror.c| 10 include/block/block.h | 10 +--- trace-events | 2 +- 4 files changed, 52 insertions(+), 33 deletions(-) diff --git a/block/io.c b/block/io.c index 5b2017f..b6a2c80 100644 --- a/block/io.c +++ b/block/io.c @@ -404,12 +404,12 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align) } /** - * Round a region to cluster boundaries + * Round a region to cluster boundaries (sector-based) */ -void bdrv_round_to_clusters(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, -int64_t *cluster_sector_num, -int *cluster_nb_sectors) +void bdrv_round_sectors_to_clusters(BlockDriverState *bs, +int64_t sector_num, int nb_sectors, +int64_t *cluster_sector_num, +int *cluster_nb_sectors) { BlockDriverInfo bdi; @@ -424,6 +424,26 @@ void bdrv_round_to_clusters(BlockDriverState *bs, } } +/** + * Round a region to cluster boundaries + */ +void bdrv_round_to_clusters(BlockDriverState *bs, +int64_t offset, unsigned int bytes, +int64_t *cluster_offset, +unsigned int *cluster_bytes) +{ +BlockDriverInfo bdi; + +if (bdrv_get_info(bs, ) < 0 || bdi.cluster_size == 0) { +*cluster_offset = offset; +*cluster_bytes = bytes; +} else { +int64_t c = bdi.cluster_size; +*cluster_offset = QEMU_ALIGN_DOWN(offset, c); +*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c); +} +} + static int bdrv_get_cluster_size(BlockDriverState *bs) { BlockDriverInfo bdi; @@ -865,7 +885,7 @@ emulate_flags: } static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, -int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) +int64_t offset, unsigned int bytes, QEMUIOVector *qiov) { /* Perform I/O through a temporary buffer so that users who scribble over * their read buffer while the operation is in progress do not end up @@ -877,21 +897,20 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, BlockDriver *drv = bs->drv; struct iovec iov; QEMUIOVector bounce_qiov; -int64_t cluster_sector_num; -int cluster_nb_sectors; +int64_t cluster_offset; +unsigned int cluster_bytes; size_t skip_bytes; int ret; /* Cover entire cluster so no additional backing file I/O is required when * allocating cluster in the image file. */ -bdrv_round_to_clusters(bs, sector_num, nb_sectors, - _sector_num, _nb_sectors); +bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes); -trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors, - cluster_sector_num, cluster_nb_sectors); +trace_bdrv_co_do_copy_on_readv(bs, offset, bytes, + cluster_offset, cluster_bytes); -iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE; +iov.iov_len = cluster_bytes; iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len); if (bounce_buffer == NULL) { ret = -ENOMEM; @@ -900,8 +919,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, qemu_iovec_init_external(_qiov, , 1); -ret = bdrv_driver_preadv(bs, cluster_sector_num * BDRV_SECTOR_SIZE, - cluster_nb_sectors * BDRV_SECTOR_SIZE, +ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes, _qiov, 0); if (ret < 0) { goto err; @@ -909,16 +927,12 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs, if (drv->bdrv_co_pwrite_zeroes && buffer_is_zero(bounce_buffer, iov.iov_len)) { -ret = bdrv_co_do_pwrite_zeroes(bs, - cluster_sector_num * BDRV_SECTOR_SIZE, - cluster_nb_sectors * BDRV_SECTOR_SIZE, - 0); +ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0); } else { /* This does not change the data on the disk, it is not necessary * to flush even in cache=writethrough mode. */ -ret = bdrv_driver_pwritev(bs, cluster_sector_num * BDRV_SECTOR_SIZE, - cluster_nb_sectors * BDRV_SECTOR_SIZE, +ret =
Re: [Qemu-block] [PATCH] qemu-img bench: Fix uninitialised writethrough mode
On 14.06.2016 11:33, Kevin Wolf wrote: > If no -t option is specified, bool writethrough stayed uninitialised. > Initialise it as false, which makes cache=writeback the default cache > mode. > > Signed-off-by: Kevin Wolf> --- > qemu-img.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v2 4/6] raw-posix: Switch to bdrv_co_* interfaces
In order to use the modern byte-based .bdrv_co_preadv/pwritev() interface, this patch switches raw-posix to coroutine-based interfaces as a first step. In terms of semantics and performance, it doesn't make a difference with the existing code whether we go from a coroutine to a callback-based interface already in block/io.c or only in linux-aio.c As there have been concerns in the past that this change may be a step in the wrong direction with respect to a possible AIO fast path, the old callback-based interface for linux-aio is left around and can be reactivated when a fast path (e.g. directly from virtio-blk dataplane, bypassing the whole block layer) is implemented. Signed-off-by: Kevin Wolf--- block/linux-aio.c | 87 +-- block/raw-aio.h | 4 +++ block/raw-posix.c | 59 + 3 files changed, 96 insertions(+), 54 deletions(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 294b2bf..74d9b33 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -11,8 +11,10 @@ #include "qemu-common.h" #include "block/aio.h" #include "qemu/queue.h" +#include "block/block.h" #include "block/raw-aio.h" #include "qemu/event_notifier.h" +#include "qemu/coroutine.h" #include @@ -30,6 +32,7 @@ struct qemu_laiocb { BlockAIOCB common; +Coroutine *co; LinuxAioState *ctx; struct iocb iocb; ssize_t ret; @@ -88,9 +91,14 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb) } } } -laiocb->common.cb(laiocb->common.opaque, ret); -qemu_aio_unref(laiocb); +laiocb->ret = ret; +if (laiocb->co) { +qemu_coroutine_enter(laiocb->co, NULL); +} else { +laiocb->common.cb(laiocb->common.opaque, ret); +qemu_aio_unref(laiocb); +} } /* The completion BH fetches completed I/O requests and invokes their @@ -232,22 +240,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s) } } -BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd, -int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, -BlockCompletionFunc *cb, void *opaque, int type) +static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, + int type) { -struct qemu_laiocb *laiocb; -struct iocb *iocbs; -off_t offset = sector_num * 512; - -laiocb = qemu_aio_get(_aiocb_info, bs, cb, opaque); -laiocb->nbytes = nb_sectors * 512; -laiocb->ctx = s; -laiocb->ret = -EINPROGRESS; -laiocb->is_read = (type == QEMU_AIO_READ); -laiocb->qiov = qiov; - -iocbs = >iocb; +LinuxAioState *s = laiocb->ctx; +struct iocb *iocbs = >iocb; +QEMUIOVector *qiov = laiocb->qiov; switch (type) { case QEMU_AIO_WRITE: @@ -260,7 +258,7 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd, default: fprintf(stderr, "%s: invalid AIO request type 0x%x.\n", __func__, type); -goto out_free_aiocb; +return -EIO; } io_set_eventfd(>iocb, event_notifier_get_fd(>e)); @@ -270,11 +268,56 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd, (!s->io_q.plugged || s->io_q.n >= MAX_QUEUED_IO)) { ioq_submit(s); } -return >common; -out_free_aiocb: -qemu_aio_unref(laiocb); -return NULL; +return 0; +} + +int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd, +int64_t sector_num, QEMUIOVector *qiov, +int nb_sectors, int type) +{ +off_t offset = sector_num * BDRV_SECTOR_SIZE; +int ret; + +struct qemu_laiocb laiocb = { +.co = qemu_coroutine_self(), +.nbytes = nb_sectors * BDRV_SECTOR_SIZE, +.ctx= s, +.is_read= (type == QEMU_AIO_READ), +.qiov = qiov, +}; + +ret = laio_do_submit(fd, , offset, type); +if (ret < 0) { +return ret; +} + +qemu_coroutine_yield(); +return laiocb.ret; +} + +BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd, +int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, +BlockCompletionFunc *cb, void *opaque, int type) +{ +struct qemu_laiocb *laiocb; +off_t offset = sector_num * BDRV_SECTOR_SIZE; +int ret; + +laiocb = qemu_aio_get(_aiocb_info, bs, cb, opaque); +laiocb->nbytes = nb_sectors * BDRV_SECTOR_SIZE; +laiocb->ctx = s; +laiocb->ret = -EINPROGRESS; +laiocb->is_read = (type == QEMU_AIO_READ); +laiocb->qiov = qiov; + +ret = laio_do_submit(fd, laiocb, offset, type); +if (ret < 0) { +qemu_aio_unref(laiocb); +return NULL; +} + +return >common; } void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context) diff --git a/block/raw-aio.h b/block/raw-aio.h index
[Qemu-block] [PATCH v2 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev
The raw-posix block driver actually supports byte-aligned requests now on non-O_DIRECT images, like it already (and previously incorrectly) claimed in bs->request_alignment. For some block drivers this means that a RMW cycle can be avoided when they write sub-sector metadata e.g. for cluster allocation. Signed-off-by: Kevin Wolf--- block/linux-aio.c | 7 ++- block/raw-aio.h | 3 +-- block/raw-posix.c | 43 +++ 3 files changed, 26 insertions(+), 27 deletions(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 74d9b33..e468960 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -273,15 +273,12 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset, } int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd, -int64_t sector_num, QEMUIOVector *qiov, -int nb_sectors, int type) +uint64_t offset, QEMUIOVector *qiov, int type) { -off_t offset = sector_num * BDRV_SECTOR_SIZE; int ret; - struct qemu_laiocb laiocb = { .co = qemu_coroutine_self(), -.nbytes = nb_sectors * BDRV_SECTOR_SIZE, +.nbytes = qiov->size, .ctx= s, .is_read= (type == QEMU_AIO_READ), .qiov = qiov, diff --git a/block/raw-aio.h b/block/raw-aio.h index 03bbfba..a4cdbbf 100644 --- a/block/raw-aio.h +++ b/block/raw-aio.h @@ -40,8 +40,7 @@ typedef struct LinuxAioState LinuxAioState; LinuxAioState *laio_init(void); void laio_cleanup(LinuxAioState *s); int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd, -int64_t sector_num, QEMUIOVector *qiov, -int nb_sectors, int type); +uint64_t offset, QEMUIOVector *qiov, int type); BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockCompletionFunc *cb, void *opaque, int type); diff --git a/block/raw-posix.c b/block/raw-posix.c index cb98769..aacf132 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1325,8 +1325,8 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd, return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque); } -static int coroutine_fn raw_co_rw(BlockDriverState *bs, int64_t sector_num, - int nb_sectors, QEMUIOVector *qiov, int type) +static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *qiov, int type) { BDRVRawState *s = bs->opaque; @@ -1344,26 +1344,28 @@ static int coroutine_fn raw_co_rw(BlockDriverState *bs, int64_t sector_num, type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s->use_aio) { -return laio_co_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, - nb_sectors, type); +assert(qiov->size == bytes); +return laio_co_submit(bs, s->aio_ctx, s->fd, offset, qiov, type); #endif } } -return paio_submit_co(bs, s->fd, sector_num * BDRV_SECTOR_SIZE, qiov, - nb_sectors * BDRV_SECTOR_SIZE, type); +return paio_submit_co(bs, s->fd, offset, qiov, bytes, type); } -static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num, - int nb_sectors, QEMUIOVector *qiov) +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *qiov, + int flags) { -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_READ); +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ); } -static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num, - int nb_sectors, QEMUIOVector *qiov) +static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset, + uint64_t bytes, QEMUIOVector *qiov, + int flags) { -return raw_co_rw(bs, sector_num, nb_sectors, qiov, QEMU_AIO_WRITE); +assert(flags == 0); +return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE); } static void raw_aio_plug(BlockDriverState *bs) @@ -1952,8 +1954,8 @@ BlockDriver bdrv_file = { .bdrv_co_get_block_status = raw_co_get_block_status, .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes, -.bdrv_co_readv = raw_co_readv, -.bdrv_co_writev = raw_co_writev, +.bdrv_co_preadv = raw_co_preadv, +.bdrv_co_pwritev= raw_co_pwritev, .bdrv_aio_flush = raw_aio_flush, .bdrv_aio_discard = raw_aio_discard,
Re: [Qemu-block] [Qemu-devel] [PATCH v5 02/11] qapi: allow QmpInputVisitor to auto-cast types
On Thu, Jun 09, 2016 at 04:03:50PM +0200, Markus Armbruster wrote: > "Daniel P. Berrange"writes: > > > Currently the QmpInputVisitor assumes that all scalar > > values are directly represented as their final types. > > ie it assumes an 'int' is using QInt, and a 'bool' is > > using QBool. > > > > This extends it so that QString is optionally permitted > > for any of the non-string scalar types. This behaviour > > is turned on by requesting the 'autocast' flag in the > > constructor. > > > > This makes it possible to use QmpInputVisitor with a > > QDict produced from QemuOpts, where everything is in > > string format. > > We're still digging. > > > Signed-off-by: Daniel P. Berrange > > --- > > docs/qapi-code-gen.txt | 2 +- > > include/qapi/qmp-input-visitor.h | 6 +- > > qapi/opts-visitor.c| 1 + > > qapi/qmp-input-visitor.c | 89 ++-- > > qmp.c | 2 +- > > qom/qom-qobject.c | 2 +- > > replay/replay-input.c | 2 +- > > scripts/qapi-commands.py | 2 +- > > tests/check-qnull.c| 2 +- > > tests/test-qmp-commands.c | 2 +- > > tests/test-qmp-input-strict.c | 2 +- > > tests/test-qmp-input-visitor.c | 115 > > - > > tests/test-visitor-serialization.c | 2 +- > > util/qemu-sockets.c| 2 +- > > 14 files changed, 201 insertions(+), 30 deletions(-) > > diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c > > index 4cf1cf8..00e4b1a 100644 > > --- a/qapi/opts-visitor.c > > +++ b/qapi/opts-visitor.c > > @@ -347,6 +347,7 @@ opts_type_bool(Visitor *v, const char *name, bool *obj, > > Error **errp) > > } > > > > if (opt->str) { > > +/* Keep these values in sync with same code in qmp-input-visitor.c > > */ > > Also with parse_option_bool(). That's the canonical place, in fact. except parse_option_bool() doesn't allow "yes", "no", "y", "n" as valid values - it only permits "on" and "off" :-( I guess I could make the parse_option_bool() method non-static, make it accept these missing values, and then call it from all places so we have guaranteed consistency. > > > if (strcmp(opt->str, "on") == 0 || > > strcmp(opt->str, "yes") == 0 || > > strcmp(opt->str, "y") == 0) { > > diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c > > index aea90a1..92023b1 100644 > > --- a/qapi/qmp-input-visitor.c > > +++ b/qapi/qmp-input-visitor.c > > @@ -20,6 +20,7 @@ > > #include "qemu-common.h" > > #include "qapi/qmp/types.h" > > #include "qapi/qmp/qerror.h" > > +#include "qemu/cutils.h" > > > > #define QIV_STACK_SIZE 1024 > > > > @@ -45,6 +46,7 @@ struct QmpInputVisitor > > > > /* True to reject parse in visit_end_struct() if unvisited keys > > remain. */ > > bool strict; > > +bool autocast; > > }; > > > > static QmpInputVisitor *to_qiv(Visitor *v) > > @@ -254,15 +256,25 @@ static void qmp_input_type_int64(Visitor *v, const > > char *name, int64_t *obj, > > Error **errp) > > { > > QmpInputVisitor *qiv = to_qiv(v); > > -QInt *qint = qobject_to_qint(qmp_input_get_object(qiv, name, true)); > > +QObject *qobj = qmp_input_get_object(qiv, name, true); > > +QInt *qint; > > +QString *qstr; > > > > -if (!qint) { > > -error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null", > > - "integer"); > > +qint = qobject_to_qint(qobj); > > +if (qint) { > > +*obj = qint_get_int(qint); > > return; > > } > > > > -*obj = qint_get_int(qint); > > +qstr = qobject_to_qstring(qobj); > > +if (qstr && qstr->string && qiv->autocast) { > > +if (qemu_strtoll(qstr->string, NULL, 10, obj) == 0) { > > In the commit message, you mentioned intended use: "with a QDict > produced from QemuOpts, where everything is in string format." I figure > you mean opts with empty opts->list->desc[]. For those, we accept any > key and parse all values as strings. > > The idea with such QemuOpts is to *delay* parsing until we know how to > parse. The delay should be transparent to the user. In particular, > values should be parsed the same whether the parsing is delayed or not. > > The canonical QemuOpts value parser is qemu_opt_parse(). It parses > integers with parse_option_number(). That function parses like > stroull(qstr->string, ..., 0). Two differences: > > * stroll() vs. strtoull(): they differ in ERANGE behavior. This might > be tolerable, but I haven't thought it through. > > * Base 0 vs 10: different syntax and semantics. Oops. Sigh, I originally had my code using strtoull() directly as the parse_option_number() method does, but had to change it to make checkpatch.pl stfu thus creating incosistency. This is a great
[Qemu-block] [PATCH v2 2/6] block: Prepare bdrv_aligned_preadv() for byte-aligned requests
This patch makes bdrv_aligned_preadv() ready to accept byte-aligned requests. Note that this doesn't mean that such requests are actually made. The caller still ensures that all requests are aligned to at least 512 bytes. Signed-off-by: Kevin WolfReviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- block/io.c | 44 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/block/io.c b/block/io.c index b6a2c80..e75bce2 100644 --- a/block/io.c +++ b/block/io.c @@ -963,11 +963,9 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, { int ret; -int64_t sector_num = offset >> BDRV_SECTOR_BITS; -unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS; - -assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0); -assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0); +assert(is_power_of_2(align)); +assert((offset & (align - 1)) == 0); +assert((bytes & (align - 1)) == 0); assert(!qiov || bytes == qiov->size); assert((bs->open_flags & BDRV_O_NO_IO) == 0); assert(!(flags & ~BDRV_REQ_MASK)); @@ -987,9 +985,12 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, } if (flags & BDRV_REQ_COPY_ON_READ) { +int64_t start_sector = offset >> BDRV_SECTOR_BITS; +int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); +unsigned int nb_sectors = end_sector - start_sector; int pnum; -ret = bdrv_is_allocated(bs, sector_num, nb_sectors, ); +ret = bdrv_is_allocated(bs, start_sector, nb_sectors, ); if (ret < 0) { goto out; } @@ -1005,28 +1006,24 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0); } else { /* Read zeros after EOF */ -int64_t total_sectors, max_nb_sectors; +int64_t total_bytes, max_bytes; -total_sectors = bdrv_nb_sectors(bs); -if (total_sectors < 0) { -ret = total_sectors; +total_bytes = bdrv_getlength(bs); +if (total_bytes < 0) { +ret = total_bytes; goto out; } -max_nb_sectors = ROUND_UP(MAX(0, total_sectors - sector_num), - align >> BDRV_SECTOR_BITS); -if (nb_sectors < max_nb_sectors) { +max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); +if (bytes < max_bytes) { ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0); -} else if (max_nb_sectors > 0) { +} else if (max_bytes > 0) { QEMUIOVector local_qiov; qemu_iovec_init(_qiov, qiov->niov); -qemu_iovec_concat(_qiov, qiov, 0, - max_nb_sectors * BDRV_SECTOR_SIZE); +qemu_iovec_concat(_qiov, qiov, 0, max_bytes); -ret = bdrv_driver_preadv(bs, offset, - max_nb_sectors * BDRV_SECTOR_SIZE, - _qiov, 0); +ret = bdrv_driver_preadv(bs, offset, max_bytes, _qiov, 0); qemu_iovec_destroy(_qiov); } else { @@ -1034,11 +1031,10 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs, } /* Reading beyond end of file is supposed to produce zeroes */ -if (ret == 0 && total_sectors < sector_num + nb_sectors) { -uint64_t offset = MAX(0, total_sectors - sector_num); -uint64_t bytes = (sector_num + nb_sectors - offset) * - BDRV_SECTOR_SIZE; -qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); +if (ret == 0 && total_bytes < offset + bytes) { +uint64_t zero_offset = MAX(0, total_bytes - offset); +uint64_t zero_bytes = offset + bytes - zero_offset; +qemu_iovec_memset(qiov, zero_offset, 0, zero_bytes); } } -- 1.8.3.1
Re: [Qemu-block] [PATCH v4 04/11] nbd: Improve server handling of bogus commands
On 13/06/2016 23:41, Alex Bligh wrote: > That's one of the reasons that there is a proposal to add > STRUCTURED_READ to the spec (although I still haven't had time to > implement that for qemu), so that we have a newer approach that allows > for proper error handling without ambiguity on whether bogus bytes must > be sent on a failed read. But you'd have to convince me that ALL > existing NBD server and client implementations expect to handle a read > error without read payload, otherwise, I will stick with the notion that > the current spec wording is correct, and that read errors CANNOT be > gracefully recovered from unless BOTH sides transfer (possibly bogus) > bytes along with the error message, and which is why BOTH sides of the > protocol are warned that read errors usually result in a disconnection > rather than clean continuation, without the addition of STRUCTURED_READ. I suspect that there are exactly two client implementations, namely Linux and QEMU's, and both do the right thing. What servers do doesn't matter, if all the clients agree. Paolo
[Qemu-block] [PATCH v2 6/6] block: Don't enforce 512 byte minimum alignment
If block drivers say that they can do an alignment < 512 bytes, let's just suppose they mean it. raw-posix used to be an offender with respect to this, but it can actually deal with byte-aligned requests now. The default is still 512 bytes for any drivers that only implement sector-based interfaces, but it is 1 now for drivers that implement .bdrv_co_preadv. Signed-off-by: Kevin WolfReviewed-by: Eric Blake --- block.c| 2 +- block/io.c | 8 +++- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/block.c b/block.c index f54bc25..3d850a2 100644 --- a/block.c +++ b/block.c @@ -937,7 +937,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, goto fail_opts; } -bs->request_alignment = 512; +bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512; bs->zero_beyond_eof = true; bs->read_only = !(bs->open_flags & BDRV_O_RDWR); diff --git a/block/io.c b/block/io.c index b261cc6..b3ff9be 100644 --- a/block/io.c +++ b/block/io.c @@ -1052,8 +1052,7 @@ int coroutine_fn bdrv_co_preadv(BlockDriverState *bs, BlockDriver *drv = bs->drv; BdrvTrackedRequest req; -/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ -uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); +uint64_t align = bs->request_alignment; uint8_t *head_buf = NULL; uint8_t *tail_buf = NULL; QEMUIOVector local_qiov; @@ -1305,7 +1304,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs, uint8_t *buf = NULL; QEMUIOVector local_qiov; struct iovec iov; -uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); +uint64_t align = bs->request_alignment; unsigned int head_padding_bytes, tail_padding_bytes; int ret = 0; @@ -1392,8 +1391,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs, BdrvRequestFlags flags) { BdrvTrackedRequest req; -/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ -uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); +uint64_t align = bs->request_alignment; uint8_t *head_buf = NULL; uint8_t *tail_buf = NULL; QEMUIOVector local_qiov; -- 1.8.3.1
[Qemu-block] [PATCH v2 0/6] block: Enable byte granularity I/O
Previous series have already converted some block drivers to byte-based rather than sector-based interfaces. However, the common I/O path as well as raw-posix still enforced a minimum alignment of 512 bytes because some sector-based logic was involved. This patch series removes these limitations and a sub-sector request actually ends up as a sub-sector syscall now. v2: - Updated trace-events for bdrv_co_do_copy_on_readv() [Eric] - Added some assertions [Eric] - Renamed laio_submit_co() -> laio_co_submit() and added coroutine_fn to its prototype [Stefan] - linux-aio: Include block/block.h and use BDRV_SECTOR_SIZE instead of 512 [Eric] Kevin Wolf (6): block: Byte-based bdrv_co_do_copy_on_readv() block: Prepare bdrv_aligned_preadv() for byte-aligned requests block: Prepare bdrv_aligned_pwritev() for byte-aligned requests raw-posix: Switch to bdrv_co_* interfaces raw-posix: Implement .bdrv_co_preadv/pwritev block: Don't enforce 512 byte minimum alignment block.c | 2 +- block/io.c| 128 ++ block/linux-aio.c | 84 - block/mirror.c| 10 ++-- block/raw-aio.h | 3 ++ block/raw-posix.c | 62 include/block/block.h | 10 ++-- trace-events | 2 +- 8 files changed, 176 insertions(+), 125 deletions(-) -- 1.8.3.1
Re: [Qemu-block] [PATCH] block-backend: allow flush on devices with open tray
On 10.06.2016 23:59, John Snow wrote: > If a device still has an attached BDS because the medium has not yet > been removed, we will be unable to migrate to a new host because > blk_flush will return an error for that backend. > > Replace the call to blk_is_available to blk_is_inserted to weaken > the check and allow flushes from the backend to work, while still > disallowing flushes from the frontend/device model to work. > > This fixes a regression present in 2.6.0 caused by the following commit: > fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf > block: Move some bdrv_*_all() functions to BB > > Signed-off-by: John Snow> --- > block/block-backend.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) I'm still not sure we shouldn't do the same for blk_{co,aio}_flush(). I guess you exclude them here because you specifically want to fix the issue mentioned in the commit message, but then we could just make blk_flush_all() ignore an -ENOMEDIUM. I personally think we should make all blk_*flush() functions use blk_is_inserted() instead of blk_is_available(). As we have discussed on IRC, there are probably not that many cases a guest can flush a medium in an open tray anyway (because the main use case are read-only CD-ROMs), and even if so, that wouldn't change any data, so even if the guest can actually flush something on an open tray, I don't think anyone would complain. Max > diff --git a/block/block-backend.c b/block/block-backend.c > index 34500e6..d1e875e 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -1122,7 +1122,7 @@ int blk_co_flush(BlockBackend *blk) > > int blk_flush(BlockBackend *blk) > { > -if (!blk_is_available(blk)) { > +if (!blk_is_inserted(blk)) { > return -ENOMEDIUM; > } > > signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 6/6] block: Don't enforce 512 byte minimum alignment
Am 14.06.2016 um 14:09 hat Stefan Hajnoczi geschrieben: > On Wed, Jun 08, 2016 at 04:10:11PM +0200, Kevin Wolf wrote: > > diff --git a/block.c b/block.c > > index f54bc25..3d850a2 100644 > > --- a/block.c > > +++ b/block.c > > @@ -937,7 +937,7 @@ static int bdrv_open_common(BlockDriverState *bs, > > BdrvChild *file, > > goto fail_opts; > > } > > > > -bs->request_alignment = 512; > > +bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512; > > What happens in the raw-posix.c AIO case? There we should still use > 512. I'm only changing the default value here. raw-posix already overrides bs->request_alignment as needed, see raw_probe_alignment(). Kevin pgpmavX5LHSRs.pgp Description: PGP signature
Re: [Qemu-block] [PATCH v3 3/5] block/null: Implement bdrv_refresh_filename()
On 12.06.2016 06:08, Fam Zheng wrote: > On Fri, 06/10 20:57, Max Reitz wrote: >> Signed-off-by: Max Reitz> > The commit message could go a little more informative. Seems nothing special > in > the added callback, aren't things supposed to just work without this patch? > What is missing? If you pass a filename, it works without this patch. If you don't, it doesn't. Compare (before this patch): $ x86_64-softmmu/qemu-system-x86_64 \ -drive if=none,file=null-co://,driver=raw -nodefaults -qmp stdio [...] {'execute':'query-block'} [...] "filename": "null-co://", [...] With: $ x86_64-softmmu/qemu-system-x86_64 \ -drive if=none,driver=raw,file.driver=null-co -nodefaults -qmp stdio [...] {'execute':'query-block'} [...] "filename": "json:{\"driver\": \"raw\", \"file\": {\"driver\": \"null-co\"}}", [...] So the issue is that you can use the null-co/aio drivers without giving a filename. I wouldn't mind including this information in a v4, but I'm not sure it alone warrants a v4. > >> --- >> block/null.c | 20 >> 1 file changed, 20 insertions(+) >> >> diff --git a/block/null.c b/block/null.c >> index 396500b..b511010 100644 >> --- a/block/null.c >> +++ b/block/null.c >> @@ -12,6 +12,8 @@ >> >> #include "qemu/osdep.h" >> #include "qapi/error.h" >> +#include "qapi/qmp/qdict.h" >> +#include "qapi/qmp/qstring.h" >> #include "block/block_int.h" >> >> #define NULL_OPT_LATENCY "latency-ns" >> @@ -223,6 +225,20 @@ static int64_t coroutine_fn >> null_co_get_block_status(BlockDriverState *bs, >> } >> } >> >> +static void null_refresh_filename(BlockDriverState *bs, QDict *opts) >> +{ >> +QINCREF(opts); >> +qdict_del(opts, "filename"); > > Why is this qdict_del necessary? It's not strictly necessary. But the null drivers ignore the filename, so there's no harm in dropping it from the JSON filename (if we need to construct one). Max >> + >> +if (!qdict_size(opts)) { >> +snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s://", >> + bs->drv->format_name); >> +} >> + >> +qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name)); >> +bs->full_open_options = opts; >> +} >> + >> static BlockDriver bdrv_null_co = { >> .format_name= "null-co", >> .protocol_name = "null-co", >> @@ -238,6 +254,8 @@ static BlockDriver bdrv_null_co = { >> .bdrv_reopen_prepare= null_reopen_prepare, >> >> .bdrv_co_get_block_status = null_co_get_block_status, >> + >> +.bdrv_refresh_filename = null_refresh_filename, >> }; >> >> static BlockDriver bdrv_null_aio = { >> @@ -255,6 +273,8 @@ static BlockDriver bdrv_null_aio = { >> .bdrv_reopen_prepare= null_reopen_prepare, >> >> .bdrv_co_get_block_status = null_co_get_block_status, >> + >> +.bdrv_refresh_filename = null_refresh_filename, >> }; >> >> static void bdrv_null_init(void) >> -- >> 2.8.3 >> signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH v3 4/5] iotests: Add test for post-mirror backing chains
On 12.06.2016 06:17, Fam Zheng wrote: > On Fri, 06/10 20:57, Max Reitz wrote: >> +import os >> +import stat >> +import time > > Unused import 'stat' and 'time'? That happens when you copy old tests and don't check stuff like this... Well, it won't hurt, but I guess now I may have enough reason to send a v4 (if I get to it today). Thanks! Max > >> +import iotests >> +from iotests import qemu_img > > signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 5/6] raw-posix: Implement .bdrv_co_preadv/pwritev
On Wed, Jun 08, 2016 at 04:10:10PM +0200, Kevin Wolf wrote: > The raw-posix block driver actually supports byte-aligned requests now > on non-O_DIRECT images, like it already (and previously incorrectly) > claimed in bs->request_alignment. > > For some block drivers this means that a RMW cycle can be avoided when > they write sub-sector metadata e.g. for cluster allocation. > > Signed-off-by: Kevin Wolf> --- > block/linux-aio.c | 6 ++ > block/raw-aio.h | 2 +- > block/raw-posix.c | 42 ++ > 3 files changed, 25 insertions(+), 25 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH 2/6] block: Prepare bdrv_aligned_preadv() for byte-aligned requests
On Wed, Jun 08, 2016 at 04:10:07PM +0200, Kevin Wolf wrote: > This patch makes bdrv_aligned_preadv() ready to accept byte-aligned > requests. Note that this doesn't mean that such requests are actually > made. The caller still ensures that all requests are aligned to at least > 512 bytes. > > Signed-off-by: Kevin Wolf> --- > block/io.c | 41 + > 1 file changed, 17 insertions(+), 24 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH 3/6] block: Prepare bdrv_aligned_pwritev() for byte-aligned requests
On Wed, Jun 08, 2016 at 04:10:08PM +0200, Kevin Wolf wrote: > Signed-off-by: Kevin Wolf> --- > block/io.c | 13 + > 1 file changed, 5 insertions(+), 8 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH 0/6] block: Enable byte granularity I/O
On Mon, Jun 13, 2016 at 03:43:06PM +0200, Kevin Wolf wrote: > Am 13.06.2016 um 15:27 hat Stefan Hajnoczi geschrieben: > > On Wed, Jun 08, 2016 at 04:10:05PM +0200, Kevin Wolf wrote: > > > Previous series have already converted some block drivers to byte-based > > > rather > > > than sector-based interfaces. However, the common I/O path as well as > > > raw-posix > > > still enforced a minimum alignment of 512 bytes because some sector-based > > > logic > > > was involved. > > > > > > This patch series removes these limitations and a sub-sector request > > > actually > > > ends up as a sub-sector syscall now. > > > > > > Kevin Wolf (6): > > > block: Byte-based bdrv_co_do_copy_on_readv() > > > block: Prepare bdrv_aligned_preadv() for byte-aligned requests > > > block: Prepare bdrv_aligned_pwritev() for byte-aligned requests > > > raw-posix: Switch to bdrv_co_* interfaces > > > raw-posix: Implement .bdrv_co_preadv/pwritev > > > block: Don't enforce 512 byte minimum alignment > > > > > > block.c | 2 +- > > > block/io.c| 125 > > > +- > > > block/linux-aio.c | 83 - > > > block/mirror.c| 10 ++-- > > > block/raw-aio.h | 2 + > > > block/raw-posix.c | 61 > > > include/block/block.h | 10 ++-- > > > 7 files changed, 169 insertions(+), 124 deletions(-) > > > > I've taken an initial look and it looks good. Will review next revision > > in depth so it can be merged after Eric's comments have been addressed. > > Eric commented a lot, but only requested very few minor changes that > wouldn't strictly require resending the series. If you think it's > worthwhile to send a v2 for them, I can do that, but it shouldn't make a > big difference for your review. Done. Looks very close. Stefan signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH 1/6] block: Byte-based bdrv_co_do_copy_on_readv()
On Wed, Jun 08, 2016 at 04:10:06PM +0200, Kevin Wolf wrote: > In a first step to convert the common I/O path to work on bytes rather > than sectors, this converts the copy-on-read logic that is used by > bdrv_aligned_preadv(). > > Signed-off-by: Kevin Wolf> --- > block/io.c| 63 > +++ > block/mirror.c| 10 > include/block/block.h | 10 +--- > 3 files changed, 51 insertions(+), 32 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature