Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-24 Thread Nir Soffer
On Fri, Aug 23, 2019 at 8:53 PM Max Reitz  wrote:

> On 23.08.19 18:48, Nir Soffer wrote:
> > On Fri, Aug 23, 2019 at 4:58 PM Max Reitz  > > wrote:
>
> [...]
>
> > If you have a format layer that truncates the image to a fixed size
> and
> > does not write anything into the first block itself (say because it
> uses
> > a footer), then (with O_DIRECT) allocate_first_block() will fail
> > (silently, because while it does return an error value, it is never
> > checked and there is no comment that explains why we don’t check it)
> >
> >
> > The motivation is that this is an optimization for the special case of
> using
> > empty image, so it does not worth failing image creation.
> > I will add a comment about that.
>
> Thanks!
>
> [...]
>
> > Thanks for the example.
> >
> > I will need time to play with blockdev and understand the flows when
> image
> > are created. Do you think is would be useful to fix now only image
> creation
> > via qemu-img, and handle blockdev later?
>
> Well, it isn’t about blockdev, it’s simply about the fact that this
> function doesn’t work for O_DIRECT files.  I showed how to reproduce the
> issue without blockdev (namely block_resize).  Sure, that is an edge
> case, but it is a completely valid case.
>
> Also, it seems to me the fix is rather simple.  Just something like:
>
> static int allocate_first_block(int fd, int64_t max_size)
> {
> int write_size = MIN(max_size, MAX_BLOCKSIZE);
> void *buf;
> ssize_t n;
>
> /* Round down to power of two */
> assert(write_size > 0);
> write_size = 1 << (31 - clz32(write_size));
>
> buf = qemu_memalign(MAX(getpagesize(), write_size), write_size);
> memset(buf, 0, write_size);
>
> do {
> n = pwrite(fd, buf, write_size, 0);
> } while (n < 0 && errno == EINTR);
>
> qemu_vfree(buf);
>
> return n < 0 ? -errno : 0;
> }
>
> Wouldn’t that work?
>

Sure, it should work.

But I think we can make this simpler, always writing MIN(max_size,
MAX_BLOCKSIZE).

vdsm is enforcing now 4k alignment, and there is no way to create images
with unaligned
size. Maybe qemu should adapt this rule?

Nir


[Qemu-block] [PATCH v2 2/2] nbd: Tolerate more errors to structured reply request

2019-08-24 Thread Eric Blake
A server may have a reason to reject a request for structured replies,
beyond just not recognizing them as a valid request; similarly, it may
have a reason for rejecting a request for a meta context.  It doesn't
hurt us to continue talking to such a server; otherwise 'qemu-nbd
--list' of such a server fails to display all available details about
the export.

Encountered when temporarily tweaking nbdkit to reply with
NBD_REP_ERR_POLICY.  Present since structured reply support was first
added (commit d795299b reused starttls handling, but starttls is
different in that we can't fall back to other behavior on any error).

Signed-off-by: Eric Blake 
---
 nbd/client.c | 63 +---
 nbd/trace-events |  2 +-
 2 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index a9d8d32feff7..b9dc829175f9 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2018 Red Hat, Inc.
+ *  Copyright (C) 2016-2019 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Client Side
@@ -142,17 +142,18 @@ static int nbd_receive_option_reply(QIOChannel *ioc, 
uint32_t opt,
 return 0;
 }

-/* If reply represents success, return 1 without further action.
- * If reply represents an error, consume the optional payload of
- * the packet on ioc.  Then return 0 for unsupported (so the client
- * can fall back to other approaches), or -1 with errp set for other
- * errors.
+/*
+ * If reply represents success, return 1 without further action.  If
+ * reply represents an error, consume the optional payload of the
+ * packet on ioc.  Then return 0 for unsupported (so the client can
+ * fall back to other approaches), where @strict determines if only
+ * ERR_UNSUP or all errors fit that category, or -1 with errp set for
+ * other errors.
  */
 static int nbd_handle_reply_err(QIOChannel *ioc, NBDOptionReply *reply,
-Error **errp)
+bool strict, Error **errp)
 {
-char *msg = NULL;
-int result = -1;
+g_autofree char *msg = NULL;

 if (!(reply->type & (1 << 31))) {
 return 1;
@@ -163,26 +164,28 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
NBDOptionReply *reply,
 error_setg(errp, "server error %" PRIu32
" (%s) message is too long",
reply->type, nbd_rep_lookup(reply->type));
-goto cleanup;
+goto err;
 }
 msg = g_malloc(reply->length + 1);
 if (nbd_read(ioc, msg, reply->length, NULL, errp) < 0) {
 error_prepend(errp, "Failed to read option error %" PRIu32
   " (%s) message: ",
   reply->type, nbd_rep_lookup(reply->type));
-goto cleanup;
+goto err;
 }
 msg[reply->length] = '\0';
 trace_nbd_server_error_msg(reply->type,
nbd_reply_type_lookup(reply->type), msg);
 }

+if (reply->type == NBD_REP_ERR_UNSUP || !strict) {
+trace_nbd_reply_err_ignored(reply->option,
+nbd_opt_lookup(reply->option),
+reply->type, nbd_rep_lookup(reply->type));
+return 0;
+}
+
 switch (reply->type) {
-case NBD_REP_ERR_UNSUP:
-trace_nbd_reply_err_unsup(reply->option, 
nbd_opt_lookup(reply->option));
-result = 0;
-goto cleanup;
-
 case NBD_REP_ERR_POLICY:
 error_setg(errp, "Denied by server for option %" PRIu32 " (%s)",
reply->option, nbd_opt_lookup(reply->option));
@@ -227,12 +230,9 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
NBDOptionReply *reply,
 error_append_hint(errp, "server reported: %s\n", msg);
 }

- cleanup:
-g_free(msg);
-if (result < 0) {
-nbd_send_opt_abort(ioc);
-}
-return result;
+ err:
+nbd_send_opt_abort(ioc);
+return -1;
 }

 /* nbd_receive_list:
@@ -257,7 +257,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
char **description,
 if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
 return -1;
 }
-error = nbd_handle_reply_err(ioc, &reply, errp);
+error = nbd_handle_reply_err(ioc, &reply, true, errp);
 if (error <= 0) {
 return error;
 }
@@ -363,7 +363,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
 if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
 return -1;
 }
-error = nbd_handle_reply_err(ioc, &reply, errp);
+error = nbd_handle_reply_err(ioc, &reply, true, errp);
 if (error <= 0) {
 return error;
 }
@@ -538,12 +538,15 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
 }
 }

-/* nbd_request_simple_option: Send an option request, and parse the reply
+/*
+ * nbd_request_simple_option: S

[Qemu-block] [PATCH v2 1/2] nbd: Use g_autofree in a few places

2019-08-24 Thread Eric Blake
Thanks to our recent move to use glib's g_autofree, I can join the
bandwagon.  Getting rid of gotos is fun ;)

There are probably more places where we could register cleanup
functions and get rid of more gotos; this patch just focuses on the
labels that existed merely to call g_free.

Signed-off-by: Eric Blake 
---
 block/nbd.c  | 11 ---
 nbd/client.c | 22 +++---
 nbd/server.c | 12 
 3 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index beed46fb3414..c4c91a158602 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1374,7 +1374,7 @@ static bool nbd_has_filename_options_conflict(QDict 
*options, Error **errp)
 static void nbd_parse_filename(const char *filename, QDict *options,
Error **errp)
 {
-char *file;
+g_autofree char *file = NULL;
 char *export_name;
 const char *host_spec;
 const char *unixpath;
@@ -1396,7 +1396,7 @@ static void nbd_parse_filename(const char *filename, 
QDict *options,
 export_name = strstr(file, EN_OPTSTR);
 if (export_name) {
 if (export_name[strlen(EN_OPTSTR)] == 0) {
-goto out;
+return;
 }
 export_name[0] = 0; /* truncate 'file' */
 export_name += strlen(EN_OPTSTR);
@@ -1407,11 +1407,11 @@ static void nbd_parse_filename(const char *filename, 
QDict *options,
 /* extract the host_spec - fail if it's not nbd:... */
 if (!strstart(file, "nbd:", &host_spec)) {
 error_setg(errp, "File name string for NBD must start with 'nbd:'");
-goto out;
+return;
 }

 if (!*host_spec) {
-goto out;
+return;
 }

 /* are we a UNIX or TCP socket? */
@@ -1431,9 +1431,6 @@ static void nbd_parse_filename(const char *filename, 
QDict *options,
 out_inet:
 qapi_free_InetSocketAddress(addr);
 }
-
-out:
-g_free(file);
 }

 static bool nbd_process_legacy_socket_options(QDict *output_options,
diff --git a/nbd/client.c b/nbd/client.c
index 49bf9906f94b..a9d8d32feff7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -247,12 +247,11 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
NBDOptionReply *reply,
 static int nbd_receive_list(QIOChannel *ioc, char **name, char **description,
 Error **errp)
 {
-int ret = -1;
 NBDOptionReply reply;
 uint32_t len;
 uint32_t namelen;
-char *local_name = NULL;
-char *local_desc = NULL;
+g_autofree char *local_name = NULL;
+g_autofree char *local_desc = NULL;
 int error;

 if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, &reply, errp) < 0) {
@@ -298,7 +297,7 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
char **description,
 local_name = g_malloc(namelen + 1);
 if (nbd_read(ioc, local_name, namelen, "export name", errp) < 0) {
 nbd_send_opt_abort(ioc);
-goto out;
+return -1;
 }
 local_name[namelen] = '\0';
 len -= namelen;
@@ -306,24 +305,17 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
char **description,
 local_desc = g_malloc(len + 1);
 if (nbd_read(ioc, local_desc, len, "export description", errp) < 0) {
 nbd_send_opt_abort(ioc);
-goto out;
+return -1;
 }
 local_desc[len] = '\0';
 }

 trace_nbd_receive_list(local_name, local_desc ?: "");
-*name = local_name;
-local_name = NULL;
+*name = g_steal_pointer(&local_name);
 if (description) {
-*description = local_desc;
-local_desc = NULL;
+*description = g_steal_pointer(&local_desc);
 }
-ret = 1;
-
- out:
-g_free(local_name);
-g_free(local_desc);
-return ret;
+return 1;
 }


diff --git a/nbd/server.c b/nbd/server.c
index 0fb41c6c50ea..74d205812fee 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -206,7 +206,7 @@ static int GCC_FMT_ATTR(4, 0)
 nbd_negotiate_send_rep_verr(NBDClient *client, uint32_t type,
 Error **errp, const char *fmt, va_list va)
 {
-char *msg;
+g_autofree char *msg = NULL;
 int ret;
 size_t len;

@@ -216,18 +216,14 @@ nbd_negotiate_send_rep_verr(NBDClient *client, uint32_t 
type,
 trace_nbd_negotiate_send_rep_err(msg);
 ret = nbd_negotiate_send_rep_len(client, type, len, errp);
 if (ret < 0) {
-goto out;
+return ret;
 }
 if (nbd_write(client->ioc, msg, len, errp) < 0) {
 error_prepend(errp, "write failed (error message): ");
-ret = -EIO;
-} else {
-ret = 0;
+return -EIO;
 }

-out:
-g_free(msg);
-return ret;
+return 0;
 }

 /* Send an error reply.
-- 
2.21.0




Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/12] block: qiov_offset parameter for io

2019-08-24 Thread Vladimir Sementsov-Ogievskiy
23.08.2019 19:21, Stefan Hajnoczi wrote:
> On Thu, Aug 22, 2019 at 05:59:43PM +, Vladimir Sementsov-Ogievskiy wrote:
>> 22.08.2019 20:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 22.08.2019 20:24, Vladimir Sementsov-Ogievskiy wrote:
 22.08.2019 18:50, Stefan Hajnoczi wrote:
> On Tue, Jun 04, 2019 at 07:15:02PM +0300, Vladimir Sementsov-Ogievskiy 
> wrote:
>> Hi all!
>>
>> Here is new parameter qiov_offset for io path, to avoid
>> a lot of places with same pattern of creating local_qiov or hd_qiov
>> variables.
>>
>> These series also includes my
>> "[Qemu-devel] [PATCH 0/2] block/io: refactor padding"
>> with some changes [described in 01 and 03 emails]
>>
>> Vladimir Sementsov-Ogievskiy (12):
>>     util/iov: introduce qemu_iovec_init_extended
>>     util/iov: improve qemu_iovec_is_zero
>>     block/io: refactor padding
>>     block: define .*_part io handlers in BlockDriver
>>     block/io: bdrv_co_do_copy_on_readv: use and support qiov_offset
>>     block/io: bdrv_co_do_copy_on_readv: lazy allocation
>>     block/io: bdrv_aligned_preadv: use and support qiov_offset
>>     block/io: bdrv_aligned_pwritev: use and support qiov_offset
>>     block/io: introduce bdrv_co_p{read,write}v_part
>>     block/qcow2: refactor qcow2_co_preadv to use buffer-based io
>>     block/qcow2: implement .bdrv_co_preadv_part
>>     block/qcow2: implement .bdrv_co_pwritev(_compressed)_part
>>
>>    block/qcow2.h |   1 +
>>    include/block/block_int.h |  21 ++
>>    include/qemu/iov.h    |  10 +-
>>    block/backup.c    |   2 +-
>>    block/io.c    | 532 ++
>>    block/qcow2-cluster.c |  14 +-
>>    block/qcow2.c | 131 +-
>>    qemu-img.c    |   4 +-
>>    util/iov.c    | 153 +--
>>    9 files changed, 559 insertions(+), 309 deletions(-)
>
> qemu-iotests 077 fails after I apply this series (including your
> uninitialized variable fix).  I'm afraid I can't include it in the block
> pull request, sorry!
>
> Stefan
>

 Hmm, 77 don't work on master for me:
 077  fail   [20:23:57] [20:23:59]    output 
 mismatch (see 077.out.bad)
 --- /work/src/qemu/up-block-drop-hd-qiov/tests/qemu-iotests/077.out 
 2019-04-22 15:06:56.162045432 +0300
 +++ /work/src/qemu/up-block-drop-hd-qiov/tests/qemu-iotests/077.out.bad 
 2019-08-22 20:23:59.124122307 +0300
 @@ -1,7 +1,15 @@
    QA output created by 077
 +==117186==WARNING: ASan doesn't fully support makecontext/swapcontext 
 functions and may produce false positives in some cases!
 +==117186==WARNING: ASan is ignoring requested __asan_handle_no_return: 
 stack top: 0x7ffefc071000; bottom 0x7fad7277b000; size: 0x0051898f6000 
 (350200225792)
 +False positive error reports may follow
 +For details see 
 http://code.google.com/p/address-sanitizer/issues/detail?id=189
    Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728

    == Some concurrent requests involving RMW ==
 +==117197==WARNING: ASan doesn't fully support makecontext/swapcontext 
 functions and may produce false positives in some cases!
 +==117197==WARNING: ASan is ignoring requested __asan_handle_no_return: 
 stack top: 0x7fffa4fcc000; bottom 0x7fa93a2c1000; size: 0x00566ad0b000 
 (371159248896)
 +False positive error reports may follow
 +For details see 
 http://code.google.com/p/address-sanitizer/issues/detail?id=189
    wrote XXX/XXX bytes at offset XXX
    XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    wrote XXX/XXX bytes at offset XXX
 @@ -66,6 +74,10 @@
    XXX bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

    == Verify image content ==
 +==117219==WARNING: ASan doesn't fully support makecontext/swapcontext 
 functions and may produce false positives in some cases!
 +==117219==WARNING: ASan is ignoring requested __asan_handle_no_return: 
 stack top: 0x7ffc722de000; bottom 0x7f0848232000; size: 0x00f42a0ac000 
 (1048677367808)
 +False positive error reports may follow
 +For details see 
 http://code.google.com/p/address-sanitizer/issues/detail?id=189
    read 512/512 bytes at offset 0
    512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    read 512/512 bytes at offset 512
 @@ -156,5 +168,9 @@
    1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    read 2048/2048 bytes at offset 71680
    2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +==117229==WARNING: ASan doesn't fully support makecontext/swapcontext 
 functions and may produce false positives in some cases!
 +==117229==WARNING: ASan is ignoring requested __asan_handle_no_return

[Qemu-block] [PATCH] block: fix permission update in bdrv_replace_node

2019-08-24 Thread Vladimir Sementsov-Ogievskiy
It's wrong to OR shared permissions. It may lead to crash on further
permission updates.
Also, no needs to consider previously calculated permissions, as at
this point we already bind all new parents and bdrv_get_cumulative_perm
result is enough. So fix the bug by just set permissions by
bdrv_get_cumulative_perm result.

Bug was introduced in long ago 234ac1a9025, in 2.9.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

I found this bug during my work around backup-top filter. It happens that
on filter removing, bdrv_replace_node() breaks permissions in graph which
lead to bdrv_set_backing_hd(new backing: NULL) on
assert(tighten_restrictions == false).

 block.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 874a29a983..5944124845 100644
--- a/block.c
+++ b/block.c
@@ -4165,7 +4165,6 @@ void bdrv_replace_node(BlockDriverState *from, 
BlockDriverState *to,
 {
 BdrvChild *c, *next;
 GSList *list = NULL, *p;
-uint64_t old_perm, old_shared;
 uint64_t perm = 0, shared = BLK_PERM_ALL;
 int ret;
 
@@ -4211,8 +4210,8 @@ void bdrv_replace_node(BlockDriverState *from, 
BlockDriverState *to,
 bdrv_unref(from);
 }
 
-bdrv_get_cumulative_perm(to, &old_perm, &old_shared);
-bdrv_set_perm(to, old_perm | perm, old_shared | shared);
+bdrv_get_cumulative_perm(to, &perm, &shared);
+bdrv_set_perm(to, perm, shared);
 
 out:
 g_slist_free(list);
-- 
2.18.0




Re: [Qemu-block] [PATCH v4 01/10] add device_legacy_reset function to prepare for reset api change

2019-08-24 Thread David Gibson
On Wed, Aug 21, 2019 at 06:33:32PM +0200, Damien Hedde wrote:
> Provide a temporary device_legacy_reset function doing what
> device_reset does to prepare for the transition with Resettable
> API.
> 
> All occurrence of device_reset in the code tree are also replaced
> by device_legacy_reset.
> 
> The new resettable API has different prototype and semantics
> (resetting child buses as well as the specified device). Subsequent
> commits will make the changeover for each call site individually; once
> that is complete device_legacy_reset() will be removed.
> 
> Signed-off-by: Damien Hedde 
> Reviewed-by: Peter Maydell 

ppc parts
Acked-by: David Gibson 

> ---
> Cc: Gerd Hoffmann 
> Cc: Paolo Bonzini 
> Cc: "Daniel P. Berrangé" 
> Cc: Eduardo Habkost 
> Cc: Richard Henderson 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: John Snow 
> Cc: "Cédric Le Goater" 
> Cc: Collin Walling 
> Cc: Cornelia Huck 
> Cc: David Hildenbrand 
> Cc: Halil Pasic 
> Cc: Christian Borntraeger 
> Cc: Dmitry Fleytman 
> Cc: Fam Zheng 
> Cc: qemu-block@nongnu.org
> Cc: qemu-...@nongnu.org
> Cc: qemu-s3...@nongnu.org
> Cc: qemu-...@nongnu.org
> ---
>  hw/audio/intel-hda.c | 2 +-
>  hw/core/qdev.c   | 6 +++---
>  hw/hyperv/hyperv.c   | 2 +-
>  hw/i386/pc.c | 2 +-
>  hw/ide/microdrive.c  | 8 
>  hw/intc/spapr_xive.c | 2 +-
>  hw/ppc/pnv_psi.c | 2 +-
>  hw/ppc/spapr_pci.c   | 2 +-
>  hw/ppc/spapr_vio.c   | 2 +-
>  hw/s390x/s390-pci-inst.c | 2 +-
>  hw/scsi/vmw_pvscsi.c | 2 +-
>  hw/sd/omap_mmc.c | 2 +-
>  hw/sd/pl181.c| 2 +-
>  include/hw/qdev-core.h   | 4 ++--
>  14 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
> index 6ecd383540..27b71c57cf 100644
> --- a/hw/audio/intel-hda.c
> +++ b/hw/audio/intel-hda.c
> @@ -1087,7 +1087,7 @@ static void intel_hda_reset(DeviceState *dev)
>  QTAILQ_FOREACH(kid, &d->codecs.qbus.children, sibling) {
>  DeviceState *qdev = kid->child;
>  cdev = HDA_CODEC_DEVICE(qdev);
> -device_reset(DEVICE(cdev));
> +device_legacy_reset(DEVICE(cdev));
>  d->state_sts |= (1 << cdev->cad);
>  }
>  intel_hda_update_irq(d);
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 60d66c2f39..a95d4fa87d 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -257,7 +257,7 @@ HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
>  
>  static int qdev_reset_one(DeviceState *dev, void *opaque)
>  {
> -device_reset(dev);
> +device_legacy_reset(dev);
>  
>  return 0;
>  }
> @@ -865,7 +865,7 @@ static void device_set_realized(Object *obj, bool value, 
> Error **errp)
>  }
>  }
>  if (dev->hotplugged) {
> -device_reset(dev);
> +device_legacy_reset(dev);
>  }
>  dev->pending_deleted_event = false;
>  
> @@ -1087,7 +1087,7 @@ void device_class_set_parent_unrealize(DeviceClass *dc,
>  dc->unrealize = dev_unrealize;
>  }
>  
> -void device_reset(DeviceState *dev)
> +void device_legacy_reset(DeviceState *dev)
>  {
>  DeviceClass *klass = DEVICE_GET_CLASS(dev);
>  
> diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> index 6ebf31c310..cd9db3cb5c 100644
> --- a/hw/hyperv/hyperv.c
> +++ b/hw/hyperv/hyperv.c
> @@ -140,7 +140,7 @@ void hyperv_synic_reset(CPUState *cs)
>  SynICState *synic = get_synic(cs);
>  
>  if (synic) {
> -device_reset(DEVICE(synic));
> +device_legacy_reset(DEVICE(synic));
>  }
>  }
>  
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3ab4bcb3ca..f33a8de42f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2826,7 +2826,7 @@ static void pc_machine_reset(MachineState *machine)
>  cpu = X86_CPU(cs);
>  
>  if (cpu->apic_state) {
> -device_reset(cpu->apic_state);
> +device_legacy_reset(cpu->apic_state);
>  }
>  }
>  }
> diff --git a/hw/ide/microdrive.c b/hw/ide/microdrive.c
> index b0272ea14b..6b30e36ed8 100644
> --- a/hw/ide/microdrive.c
> +++ b/hw/ide/microdrive.c
> @@ -173,7 +173,7 @@ static void md_attr_write(PCMCIACardState *card, uint32_t 
> at, uint8_t value)
>  case 0x00:   /* Configuration Option Register */
>  s->opt = value & 0xcf;
>  if (value & OPT_SRESET) {
> -device_reset(DEVICE(s));
> +device_legacy_reset(DEVICE(s));
>  }
>  md_interrupt_update(s);
>  break;
> @@ -316,7 +316,7 @@ static void md_common_write(PCMCIACardState *card, 
> uint32_t at, uint16_t value)
>  case 0xe:/* Device Control */
>  s->ctrl = value;
>  if (value & CTRL_SRST) {
> -device_reset(DEVICE(s));
> +device_legacy_reset(DEVICE(s));
>  }
>  md_interrupt_update(s);
>  break;
> @@ -541,7 +541,7 @@ static int dscm1_attach(PCMCIACardState *card)
>  md->attr_base = pcc->cis[0x74] | (pcc->cis[0x76] <<

Re: [Qemu-block] [PATCH 1/1] protocol: Add NBD_CMD_FLAG_FAST_ZERO

2019-08-24 Thread Wouter Verhelst
On Fri, Aug 23, 2019 at 01:58:44PM -0500, Eric Blake wrote:
> On 8/23/19 1:48 PM, Wouter Verhelst wrote:
> > On Fri, Aug 23, 2019 at 09:34:26AM -0500, Eric Blake wrote:
> >> +- bit 4, `NBD_CMD_FLAG_FAST_ZERO`; valid during
> >> +  `NBD_CMD_WRITE_ZEROES`. If set, but the server cannot perform the
> >> +  write zeroes any faster than it would for an equivalent
> >> +  `NBD_CMD_WRITE`,
> > 
> > One way of fulfilling the letter of this requirement but not its spirit
> > could be to have background writes; that is, the server makes a note
> > that the zeroed region should contain zeroes, makes an error-free reply
> > to the client, and then starts updating things in the background (with
> > proper layering so that an NBD_CMD_READ would see zeroes).
> 
> For writes, this should still be viable IF the server can also cancel
> that background write of zeroes in favor of a foreground request for
> actual data to be written to the same offset.  In other words, as long
> as the behavior to the client is "as if" there is no duplicated I/O
> cost, the zero appears fast, even if it kicked off a long-running async
> process to actually accomplish it.

That's kind of what I was thinking of, yeah.

A background write would cause disk I/O, which *will* cause any write
that happen concurrently with it to slow down. If we need to write
several orders of magnitude of zeroes, then the "fast zero" will
actually cause the following writes to slow down, which could impact
performance.

The cancelling should indeed happen (otherwise ordering of writes will
be wrong, as per the spec), but that doesn't negate the performance
impact.

> > This could negatively impact performance after that command to the
> > effect that syncing the device would be slower rather than faster, if
> > not done right.
> 
> Oh. I see - for flush requests, you're worried about the cost of the
> flush forcing the I/O for the background zero to complete before flush
> can return.
> 
> Perhaps that merely means that a client using fast zero requests as a
> means of probing whether it can do a bulk pre-zero pass even though it
> will be rewriting part of that image with data later SHOULD NOT attempt
> to flush the disk until all other interesting write requests are also
> ready to queue.  In the 'qemu-img convert' case which spawned this
> discussion, that's certainly the case (qemu-img does not call flush
> after the pre-zeroing, but only after all data is copied - and then it
> really DOES want to wait for any remaining backgrounded zeroing to land
> on the disk along with any normal writes when it does its final flush).

Not what I meant, but also a good point, thanks :)

> > Do we want to keep that in consideration?
> 
> Ideas on how best to add what I mentioned above into the specification?

Perhaps clarify that the "fast zero" flag is meant to *improve*
performance, and that it therefore should either be implemented in a way
that does in fact improve performance, or not at all?

-- 
 Home is where you have to wash the dishes.
  -- #debian-devel, Freenode, 2004-09-22