Re: [Qemu-block] Combining synchronous and asynchronous IO

2019-03-15 Thread Kevin Wolf
Am 15.03.2019 um 16:33 hat Sergio Lopez geschrieben:
> 
> Stefan Hajnoczi writes:
> 
> > On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote:
> >> Our current AIO path does a great job at unloading the work from the VM,
> >> and combined with IOThreads provides a good performance in most
> >> scenarios. But it also comes with its costs, in both a longer execution
> >> path and the need of the intervention of the scheduler at various
> >> points.
> >> 
> >> There's one particular workload that suffers from this cost, and that's
> >> when you have just 1 or 2 cores on the Guest issuing synchronous
> >> requests. This happens to be a pretty common workload for some DBs and,
> >> in a general sense, on small VMs.
> >> 
> >> I did a quick'n'dirty implementation on top of virtio-blk to get some
> >> numbers. This comes from a VM with 4 CPUs running on an idle server,
> >> with a secondary virtio-blk disk backed by a null_blk device with a
> >> simulated latency of 30us.
> >
> > Can you describe the implementation in more detail?  Does "synchronous"
> > mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call
> > instead of calling blk_aio_preadv/pwritev()?  If so, then you are also
> > bypassing the QEMU block layer (coroutines, request tracking, etc) and
> > that might explain some of the latency.
> 
> The first implementation, the one I've used for getting these numbers,
> it's just preadv/pwrite from virtio_blk.c, as you correctly guessed. I
> know it's unfair, but I wanted to take a look at the best possible
> scenario, and then measure the cost of the other layers.
> 
> I'm working now on writing non-coroutine counterparts for
> blk_co_[preadv|pwrite], so we have SIO without bypassing the block layer.

Maybe try to keep the change local to file-posix.c? I think you would
only have to modify raw_thread_pool_submit() so that it doesn't go
through the thread pool, but just calls func directly.

I don't think avoiding coroutines is possible without bypassing the block
layer altogether because everything is really expecting to be run in
coroutine context.

Kevin



Re: [Qemu-block] Combining synchronous and asynchronous IO

2019-03-15 Thread Sergio Lopez


Stefan Hajnoczi writes:

> On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote:
>> Our current AIO path does a great job at unloading the work from the VM,
>> and combined with IOThreads provides a good performance in most
>> scenarios. But it also comes with its costs, in both a longer execution
>> path and the need of the intervention of the scheduler at various
>> points.
>> 
>> There's one particular workload that suffers from this cost, and that's
>> when you have just 1 or 2 cores on the Guest issuing synchronous
>> requests. This happens to be a pretty common workload for some DBs and,
>> in a general sense, on small VMs.
>> 
>> I did a quick'n'dirty implementation on top of virtio-blk to get some
>> numbers. This comes from a VM with 4 CPUs running on an idle server,
>> with a secondary virtio-blk disk backed by a null_blk device with a
>> simulated latency of 30us.
>
> Can you describe the implementation in more detail?  Does "synchronous"
> mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call
> instead of calling blk_aio_preadv/pwritev()?  If so, then you are also
> bypassing the QEMU block layer (coroutines, request tracking, etc) and
> that might explain some of the latency.

The first implementation, the one I've used for getting these numbers,
it's just preadv/pwrite from virtio_blk.c, as you correctly guessed. I
know it's unfair, but I wanted to take a look at the best possible
scenario, and then measure the cost of the other layers.

I'm working now on writing non-coroutine counterparts for
blk_co_[preadv|pwrite], so we have SIO without bypassing the block layer.

> It's important for this discussion that we understand what your tried
> out.  "Synchronous" can mean different things.  Since iothread is in
> play the code path is still asynchronous from the vcpu thread's
> perspective (thanks ioeventfd!).  The guest CPU is not stuck during I/O
> (good for quality of service) - however SIO+iothread may need to be
> woken up and scheduled on a host CPU (bad for latency).

I've tried SIO with ioeventfd=off, to make it fully synchronous, but the
performance it's significantly worse. Not sure if this is due to cache
pollution, or simply the guest CPU is able to move on early and be ready
to process the IRQ when it's signalled. Or maybe both.

>>  - Average latency (us)
>> 
>> 
>> || AIO+iothread | SIO+iothread |
>> | 1 job  |  70  |  55  |
>> | 2 jobs |  83  |  82  |
>> | 4 jobs |  90  | 159  |
>> 
>
> BTW recently I've found that the latency distribution can contain
> important clues that a simple average doesn't show (e.g. multiple peaks,
> outliers, etc).  If you continue to investigate this it might be
> interesting to plot the distribution.

Interesting, noted.

>> In this case the intuition matches the reality, and synchronous IO wins
>> when there's just 1 job issuing the requests, while it loses hard when
>> the are 4.
>
> Have you looked at the overhead of AIO+event loop?  ppoll()/epoll(),
> read()ing the eventfd to clear it, and Linux AIO io_submit().

Not since a while, and that reminds me I wanted to check if we could
improve the poll-max-ns heuristics.

> I had some small patches that try to reorder/optimize these operations
> but never got around to benchmarking and publishing them.  They do not
> reduce latency as low as SIO but they shave off a handful of
> microseconds.
>
> Resuming this work might be useful.  Let me know if you'd like me to dig
> out the old patches.

I would definitely like to take a look at those patches.

>> 
>> While my first thought was implementing this as a tunable, turns out we
>> have a hint about the nature of the workload in the number of the
>> requests in the VQ. So I updated the code to use SIO if there's just 1
>> request and AIO otherwise, with these results:
>
> Nice, avoiding tunables is good.  That way it can automatically adjust
> depending on the current workload and we don't need to educate users on
> tweaking a tunable.
>
>> 
>> ---
>> || AIO+iothread | SIO+iothread | AIO+SIO+iothread |
>> | 1 job  |  70  |  55  |55|
>> | 2 jobs |  83  |  82  |78|
>> | 4 jobs |  90  | 159  |90|
>> ---
>> 
>> This data makes me think this is something worth pursuing, but I'd like
>> to hear your opinion on it.
>
> I think it's definitely worth experimenting with more.  One thing to
> consider: the iothread is a shared resource when multiple devices are
> assigned to a single iothread.  In that case we probably do not want SIO
> since it would block the other emulated devices from processing
> requests.

Good point.

> On a related note, there is a summer internship project to implem

Re: [Qemu-block] [PATCH] vpc: unlock Coroutine lock to make IO submit Concurrently

2019-03-15 Thread Kevin Wolf
Am 15.03.2019 um 15:04 hat Zhengui li geschrieben:
> Concurrent IO becomes serial IO because of the qemu Coroutine lock,
> which reduce IO performance severely.
> 
> So unlock Coroutine lock before bdrv_co_pwritev and
> bdrv_co_preadv to fix it.
> 
> Signed-off-by: Zhengui li 

Thanks, applied to the block-next branch for 4.1.

Kevin



Re: [Qemu-block] [PATCH] vpc: unlock Coroutine lock to make IO submit Concurrently

2019-03-15 Thread Paolo Bonzini
On 15/03/19 15:04, Zhengui li wrote:
> Concurrent IO becomes serial IO because of the qemu Coroutine lock,
> which reduce IO performance severely.
> 
> So unlock Coroutine lock before bdrv_co_pwritev and
> bdrv_co_preadv to fix it.
> 
> Signed-off-by: Zhengui li 
> ---
>  block/vpc.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index 52ab717..1133855 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -639,8 +639,10 @@ vpc_co_preadv(BlockDriverState *bs, uint64_t offset, 
> uint64_t bytes,
>  qemu_iovec_reset(&local_qiov);
>  qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
>  
> +qemu_co_mutex_unlock(&s->lock);
>  ret = bdrv_co_preadv(bs->file, image_offset, n_bytes,
>   &local_qiov, 0);
> +qemu_co_mutex_lock(&s->lock);
>  if (ret < 0) {
>  goto fail;
>  }
> @@ -697,8 +699,10 @@ vpc_co_pwritev(BlockDriverState *bs, uint64_t offset, 
> uint64_t bytes,
>  qemu_iovec_reset(&local_qiov);
>  qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
>  
> +qemu_co_mutex_unlock(&s->lock);
>  ret = bdrv_co_pwritev(bs->file, image_offset, n_bytes,
>&local_qiov, 0);
> +qemu_co_mutex_lock(&s->lock);
>  if (ret < 0) {
>  goto fail;
>  }
> 

This should be okay, because vpc.c is somewhat simple-minded and it
doesn't recycle unused blocks in the middle of the file.

Reviewed-by: Paolo Bonzini 

Paolo



Re: [Qemu-block] Combining synchronous and asynchronous IO

2019-03-15 Thread Stefan Hajnoczi
On Thu, Mar 14, 2019 at 06:31:34PM +0100, Sergio Lopez wrote:
> Our current AIO path does a great job at unloading the work from the VM,
> and combined with IOThreads provides a good performance in most
> scenarios. But it also comes with its costs, in both a longer execution
> path and the need of the intervention of the scheduler at various
> points.
> 
> There's one particular workload that suffers from this cost, and that's
> when you have just 1 or 2 cores on the Guest issuing synchronous
> requests. This happens to be a pretty common workload for some DBs and,
> in a general sense, on small VMs.
> 
> I did a quick'n'dirty implementation on top of virtio-blk to get some
> numbers. This comes from a VM with 4 CPUs running on an idle server,
> with a secondary virtio-blk disk backed by a null_blk device with a
> simulated latency of 30us.

Can you describe the implementation in more detail?  Does "synchronous"
mean that hw/block/virtio_blk.c makes a blocking preadv()/pwritev() call
instead of calling blk_aio_preadv/pwritev()?  If so, then you are also
bypassing the QEMU block layer (coroutines, request tracking, etc) and
that might explain some of the latency.

It's important for this discussion that we understand what your tried
out.  "Synchronous" can mean different things.  Since iothread is in
play the code path is still asynchronous from the vcpu thread's
perspective (thanks ioeventfd!).  The guest CPU is not stuck during I/O
(good for quality of service) - however SIO+iothread may need to be
woken up and scheduled on a host CPU (bad for latency).

>  - Average latency (us)
> 
> 
> || AIO+iothread | SIO+iothread |
> | 1 job  |  70  |  55  |
> | 2 jobs |  83  |  82  |
> | 4 jobs |  90  | 159  |
> 

BTW recently I've found that the latency distribution can contain
important clues that a simple average doesn't show (e.g. multiple peaks,
outliers, etc).  If you continue to investigate this it might be
interesting to plot the distribution.

> In this case the intuition matches the reality, and synchronous IO wins
> when there's just 1 job issuing the requests, while it loses hard when
> the are 4.

Have you looked at the overhead of AIO+event loop?  ppoll()/epoll(),
read()ing the eventfd to clear it, and Linux AIO io_submit().

I had some small patches that try to reorder/optimize these operations
but never got around to benchmarking and publishing them.  They do not
reduce latency as low as SIO but they shave off a handful of
microseconds.

Resuming this work might be useful.  Let me know if you'd like me to dig
out the old patches.

> 
> While my first thought was implementing this as a tunable, turns out we
> have a hint about the nature of the workload in the number of the
> requests in the VQ. So I updated the code to use SIO if there's just 1
> request and AIO otherwise, with these results:

Nice, avoiding tunables is good.  That way it can automatically adjust
depending on the current workload and we don't need to educate users on
tweaking a tunable.

> 
> ---
> || AIO+iothread | SIO+iothread | AIO+SIO+iothread |
> | 1 job  |  70  |  55  |55|
> | 2 jobs |  83  |  82  |78|
> | 4 jobs |  90  | 159  |90|
> ---
> 
> This data makes me think this is something worth pursuing, but I'd like
> to hear your opinion on it.

I think it's definitely worth experimenting with more.  One thing to
consider: the iothread is a shared resource when multiple devices are
assigned to a single iothread.  In that case we probably do not want SIO
since it would block the other emulated devices from processing
requests.

On a related note, there is a summer internship project to implement
support for the new io_uring API (successor to Linux AIO):
https://wiki.qemu.org/Google_Summer_of_Code_2019#io_uring_AIO_engine

So please *don't* implement io_uring support right now ;-).

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH] block: Silence Coverity in bdrv_drop_intermediate()

2019-03-15 Thread Markus Armbruster
Kevin Wolf  writes:

> Coverity doesn't like that the return value of bdrv_check_update_perm()
> stays unused only in this place (CID 1399710).
>
> Even if checking local_err should be equivalent to checking ret < 0,
> let's switch to using the return value to be more consistent (and in
> case of a bug somewhere down the call chain, forgetting to assign errp
> is more likely than returning 0 for an error case).
>
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index ed9253c786..0a93ee9ac8 100644
> --- a/block.c
> +++ b/block.c
> @@ -4350,11 +4350,10 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
> BlockDriverState *base,
>  QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
>  /* Check whether we are allowed to switch c from top to base */
>  GSList *ignore_children = g_slist_prepend(NULL, c);
> -bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> -   ignore_children, &local_err);
> +ret = bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> + ignore_children, &local_err);
>  g_slist_free(ignore_children);
> -if (local_err) {
> -ret = -EPERM;
> +if (ret < 0) {
>  error_report_err(local_err);
>  goto exit;
>  }

Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [Qemu-devel] [PATCH] xen-block: Replace qdict_put_obj() by qdict_put() where appropriate

2019-03-15 Thread Markus Armbruster
Anthony PERARD  writes:

> On Thu, Mar 14, 2019 at 08:04:00PM +0100, Markus Armbruster wrote:
>> Kevin Wolf  writes:
>> 
>> > Am 13.03.2019 um 18:44 hat Markus Armbruster geschrieben:
>> >> Patch created mechanically by rerunning:
>> >> 
>> >> $ spatch --sp-file scripts/coccinelle/qobject.cocci \
>> >>  --macro-file scripts/cocci-macro-file.h \
>> >>  --dir hw/block --in-place
>> >> 
>> >> Signed-off-by: Markus Armbruster 
>> >
>> > Reviewed-by: Kevin Wolf 
>> 
>> Thanks!
>> 
>> > Which tree should this go through? The Xen one?
>> 
>> Fine with me.  I could also include it in a "miscellaneous cleanup" pull
>> request along with other cleanup patches I got in flight.
>
> Markus, I don't have any other Xen patches, so could you include this
> one in your pull request?

Sure!



Re: [Qemu-block] [Qemu-devel] [PATCH] block: Silence Coverity in bdrv_drop_intermediate()

2019-03-15 Thread Peter Maydell
On Fri, 15 Mar 2019 at 14:27, Kevin Wolf  wrote:
>
> Coverity doesn't like that the return value of bdrv_check_update_perm()
> stays unused only in this place (CID 1399710).
>
> Even if checking local_err should be equivalent to checking ret < 0,
> let's switch to using the return value to be more consistent (and in
> case of a bug somewhere down the call chain, forgetting to assign errp
> is more likely than returning 0 for an error case).
>
> Signed-off-by: Kevin Wolf 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [Qemu-block] [PATCH] block: Silence Coverity in bdrv_drop_intermediate()

2019-03-15 Thread Alberto Garcia
On Fri 15 Mar 2019 12:18:43 PM CET, Kevin Wolf  wrote:
> Coverity doesn't like that the return value of bdrv_check_update_perm()
> stays unused only in this place (CID 1399710).
>
> Even if checking local_err should be equivalent to checking ret < 0,
> let's switch to using the return value to be more consistent (and in
> case of a bug somewhere down the call chain, forgetting to assign errp
> is more likely than returning 0 for an error case).
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



[Qemu-block] [PATCH] vpc: unlock Coroutine lock to make IO submit Concurrently

2019-03-15 Thread Zhengui li
Concurrent IO becomes serial IO because of the qemu Coroutine lock,
which reduce IO performance severely.

So unlock Coroutine lock before bdrv_co_pwritev and
bdrv_co_preadv to fix it.

Signed-off-by: Zhengui li 
---
 block/vpc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/vpc.c b/block/vpc.c
index 52ab717..1133855 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -639,8 +639,10 @@ vpc_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 qemu_iovec_reset(&local_qiov);
 qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
 
+qemu_co_mutex_unlock(&s->lock);
 ret = bdrv_co_preadv(bs->file, image_offset, n_bytes,
  &local_qiov, 0);
+qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
 goto fail;
 }
@@ -697,8 +699,10 @@ vpc_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 qemu_iovec_reset(&local_qiov);
 qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
 
+qemu_co_mutex_unlock(&s->lock);
 ret = bdrv_co_pwritev(bs->file, image_offset, n_bytes,
   &local_qiov, 0);
+qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
 goto fail;
 }
-- 
2.7.2.windows.1





Re: [Qemu-block] [Qemu-devel] [PULL 34/54] commit: Support multiple roots above top node

2019-03-15 Thread Kevin Wolf
Am 14.03.2019 um 18:06 hat Peter Maydell geschrieben:
> On Fri, 6 Oct 2017 at 17:20, Kevin Wolf  wrote:
> >
> > This changes the commit block job to support operation in a graph where
> > there is more than a single active layer that references the top node.
> >
> > This involves inserting the commit filter node not only on the path
> > between the given active node and the top node, but between the top node
> > and all of its parents.
> >
> > On completion, bdrv_drop_intermediate() must consider all parents for
> > updating the backing file link. These parents may be backing files
> > themselves and as such read-only; reopen them temporarily if necessary.
> > Previously this was achieved by the bdrv_reopen() calls in the commit
> > block job that made overlay_bs read-write for the whole duration of the
> > block job, even though write access is only needed on completion.
> >
> > Now that we consider all parents, overlay_bs is meaningless. It is left
> > in place in this commit, but we'll remove it soon.
> >
> > Signed-off-by: Kevin Wolf 
> 
> Hi -- a recent change to the block layer has caused Coverity
> to flag up a possible issue with this older commit: CID 1399710:
> 
> > @@ -3492,42 +3504,42 @@ int bdrv_drop_intermediate(BlockDriverState 
> > *active, BlockDriverState *top,
> >  goto exit;
> >  }
> >
> > -new_top_bs = bdrv_find_overlay(active, top);
> > -
> > -if (new_top_bs == NULL) {
> > -/* we could not find the image above 'top', this is an error */
> > -goto exit;
> > -}
> > -
> > -/* special case of new_top_bs->backing->bs already pointing to base - 
> > nothing
> > - * to do, no intermediate images */
> > -if (backing_bs(new_top_bs) == base) {
> > -ret = 0;
> > -goto exit;
> > -}
> > -
> >  /* Make sure that base is in the backing chain of top */
> >  if (!bdrv_chain_contains(top, base)) {
> >  goto exit;
> >  }
> >
> >  /* success - we can delete the intermediate states, and link top->base 
> > */
> > -if (new_top_bs->backing->role->update_filename) {
> > -backing_file_str = backing_file_str ? backing_file_str : 
> > base->filename;
> > -ret = 
> > new_top_bs->backing->role->update_filename(new_top_bs->backing,
> > - base, 
> > backing_file_str,
> > - &local_err);
> > -if (ret < 0) {
> > -bdrv_set_backing_hd(new_top_bs, top, &error_abort);
> > +backing_file_str = backing_file_str ? backing_file_str : 
> > base->filename;
> > +
> > +QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
> > +/* Check whether we are allowed to switch c from top to base */
> > +GSList *ignore_children = g_slist_prepend(NULL, c);
> > +bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
> > +   ignore_children, &local_err);
> 
> Here we don't check the return value from bdrv_check_update_perm(),
> which we do in all four other places that we call it. I think this
> is probably a false positive in that the function will only
> return ret < 0 if it also returns a failure via local_err, but
> it might be worth being consistent just to placate Coverity.

I agree, this shouldn't be a problem in theory. I've sent a patch anyway
to make things more consistent.

Kevin

> > +if (local_err) {
> > +ret = -EPERM;
> > +error_report_err(local_err);
> >  goto exit;
> >  }
> > -}
> 
> Happy to just mark the issue as a false-positive in the Coverity
> UI if you think that's the best resolution.
> 
> thanks
> -- PMM



Re: [Qemu-block] [PATCH] iotests: 153: Wait for an answer to QMP commands

2019-03-15 Thread Kevin Wolf
Am 15.03.2019 um 12:46 hat Sergio Lopez geschrieben:
> There are various actions in this test that must be executed
> sequentially, as the result of it depends on the state triggered by the
> previous one.
> 
> If the last argument of _send_qemu_cmd() is an empty string, it just
> sends the QMP commands without waiting for an answer. While unlikely, it
> may happen that the next action in the test gets invoked before QEMU
> processes the QMP request.
> 
> This issue seems to be easier to reproduce on servers with limited
> resources or highly loaded.
> 
> With this change, we wait for an answer on all _send_qemu_cmd() calls.
> 
> Signed-off-by: Sergio Lopez 

Thanks, applied to the block branch.

Kevin



[Qemu-block] [PATCH] block: Silence Coverity in bdrv_drop_intermediate()

2019-03-15 Thread Kevin Wolf
Coverity doesn't like that the return value of bdrv_check_update_perm()
stays unused only in this place (CID 1399710).

Even if checking local_err should be equivalent to checking ret < 0,
let's switch to using the return value to be more consistent (and in
case of a bug somewhere down the call chain, forgetting to assign errp
is more likely than returning 0 for an error case).

Signed-off-by: Kevin Wolf 
---
 block.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index ed9253c786..0a93ee9ac8 100644
--- a/block.c
+++ b/block.c
@@ -4350,11 +4350,10 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,
 QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
 /* Check whether we are allowed to switch c from top to base */
 GSList *ignore_children = g_slist_prepend(NULL, c);
-bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
-   ignore_children, &local_err);
+ret = bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
+ ignore_children, &local_err);
 g_slist_free(ignore_children);
-if (local_err) {
-ret = -EPERM;
+if (ret < 0) {
 error_report_err(local_err);
 goto exit;
 }
-- 
2.20.1




Re: [Qemu-block] [Qemu-devel] [PATCH] xen-block: Replace qdict_put_obj() by qdict_put() where appropriate

2019-03-15 Thread Anthony PERARD
On Thu, Mar 14, 2019 at 08:04:00PM +0100, Markus Armbruster wrote:
> Kevin Wolf  writes:
> 
> > Am 13.03.2019 um 18:44 hat Markus Armbruster geschrieben:
> >> Patch created mechanically by rerunning:
> >> 
> >> $ spatch --sp-file scripts/coccinelle/qobject.cocci \
> >>  --macro-file scripts/cocci-macro-file.h \
> >>  --dir hw/block --in-place
> >> 
> >> Signed-off-by: Markus Armbruster 
> >
> > Reviewed-by: Kevin Wolf 
> 
> Thanks!
> 
> > Which tree should this go through? The Xen one?
> 
> Fine with me.  I could also include it in a "miscellaneous cleanup" pull
> request along with other cleanup patches I got in flight.

Markus, I don't have any other Xen patches, so could you include this
one in your pull request?

Thanks,

-- 
Anthony PERARD



[Qemu-block] [PATCH] iotests: 153: Wait for an answer to QMP commands

2019-03-15 Thread Sergio Lopez
There are various actions in this test that must be executed
sequentially, as the result of it depends on the state triggered by the
previous one.

If the last argument of _send_qemu_cmd() is an empty string, it just
sends the QMP commands without waiting for an answer. While unlikely, it
may happen that the next action in the test gets invoked before QEMU
processes the QMP request.

This issue seems to be easier to reproduce on servers with limited
resources or highly loaded.

With this change, we wait for an answer on all _send_qemu_cmd() calls.

Signed-off-by: Sergio Lopez 
---
 tests/qemu-iotests/153 | 12 ++--
 tests/qemu-iotests/153.out |  6 ++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/153 b/tests/qemu-iotests/153
index c989c2495f..08ad8a6730 100755
--- a/tests/qemu-iotests/153
+++ b/tests/qemu-iotests/153
@@ -155,7 +155,7 @@ for opts1 in "" "read-only=on" 
"read-only=on,force-share=on"; do
 _img_info -U | grep 'file format'
 fi
 done
-_send_qemu_cmd $h "{ 'execute': 'quit', }" ""
+_send_qemu_cmd $h "{ 'execute': 'quit' }" ''
 echo
 echo "Round done"
 _cleanup_qemu
@@ -219,7 +219,7 @@ echo "Adding drive"
 _send_qemu_cmd $QEMU_HANDLE \
 "{ 'execute': 'human-monitor-command',
'arguments': { 'command-line': 'drive_add 0 
if=none,id=d0,file=${TEST_IMG}' } }" \
-""
+'return'
 
 _run_cmd $QEMU_IO "${TEST_IMG}" -c 'write 0 512'
 
@@ -230,7 +230,7 @@ echo "== Closing an image should unlock it =="
 _send_qemu_cmd $QEMU_HANDLE \
 "{ 'execute': 'human-monitor-command',
'arguments': { 'command-line': 'drive_del d0' } }" \
-""
+'return'
 
 _run_cmd $QEMU_IO "${TEST_IMG}" -c 'write 0 512'
 
@@ -239,7 +239,7 @@ for d in d0 d1; do
 _send_qemu_cmd $QEMU_HANDLE \
 "{ 'execute': 'human-monitor-command',
'arguments': { 'command-line': 'drive_add 0 
if=none,id=$d,file=${TEST_IMG},readonly=on' } }" \
-""
+'return'
 done
 
 _run_cmd $QEMU_IMG info "${TEST_IMG}"
@@ -247,7 +247,7 @@ _run_cmd $QEMU_IMG info "${TEST_IMG}"
 _send_qemu_cmd $QEMU_HANDLE \
 "{ 'execute': 'human-monitor-command',
'arguments': { 'command-line': 'drive_del d0' } }" \
-""
+'return'
 
 _run_cmd $QEMU_IO "${TEST_IMG}" -c 'write 0 512'
 
@@ -255,7 +255,7 @@ echo "Closing the other"
 _send_qemu_cmd $QEMU_HANDLE \
 "{ 'execute': 'human-monitor-command',
'arguments': { 'command-line': 'drive_del d1' } }" \
-""
+'return'
 
 _run_cmd $QEMU_IO "${TEST_IMG}" -c 'write 0 512'
 
diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
index 884254868c..9747ce3c41 100644
--- a/tests/qemu-iotests/153.out
+++ b/tests/qemu-iotests/153.out
@@ -417,6 +417,7 @@ Is another process using the image [TEST_DIR/t.qcow2]?
 _qemu_img_wrapper commit -b TEST_DIR/t.qcow2.b TEST_DIR/t.qcow2.c
 {"return": {}}
 Adding drive
+{"return": "OKrn"}
 
 _qemu_io_wrapper TEST_DIR/t.qcow2 -c write 0 512
 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
@@ -425,16 +426,21 @@ Creating overlay with qemu-img when the guest is running 
should be allowed
 
 _qemu_img_wrapper create -f qcow2 -b TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.overlay
 == Closing an image should unlock it ==
+{"return": ""}
 
 _qemu_io_wrapper TEST_DIR/t.qcow2 -c write 0 512
 Adding two and closing one
+{"return": "OKrn"}
+{"return": "OKrn"}
 
 _qemu_img_wrapper info TEST_DIR/t.qcow2
+{"return": ""}
 
 _qemu_io_wrapper TEST_DIR/t.qcow2 -c write 0 512
 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
 Is another process using the image [TEST_DIR/t.qcow2]?
 Closing the other
+{"return": ""}
 
 _qemu_io_wrapper TEST_DIR/t.qcow2 -c write 0 512
 
-- 
2.20.1




Re: [Qemu-block] [PULL 13/28] file-posix: Prepare permission code for fd switching

2019-03-15 Thread Kevin Wolf
Am 14.03.2019 um 18:27 hat Peter Maydell geschrieben:
> On Tue, 12 Mar 2019 at 17:30, Kevin Wolf  wrote:
> >
> > In order to be able to dynamically reopen the file read-only or
> > read-write, depending on the users that are attached, we need to be able
> > to switch to a different file descriptor during the permission change.
> >
> > This interacts with reopen, which also creates a new file descriptor and
> > performs permission changes internally. In this case, the permission
> > change code must reuse the reopen file descriptor instead of creating a
> > third one.
> >
> > In turn, reopen can drop its code to copy file locks to the new file
> > descriptor because that is now done when applying the new permissions.
> 
> Hi -- Coverity doesn't like this function (CID 1399712).
> I think this may be a false positive, but could you confirm?
> 
> > @@ -2696,12 +2695,78 @@ static QemuOptsList raw_create_opts = {
> >  static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t 
> > shared,
> >Error **errp)
> >  {
> > -return raw_handle_perm_lock(bs, RAW_PL_PREPARE, perm, shared, errp);
> > +BDRVRawState *s = bs->opaque;
> > +BDRVRawReopenState *rs = NULL;
> > +int open_flags;
> > +int ret;
> > +
> > +if (s->perm_change_fd) {
> > +/*
> > + * In the context of reopen, this function may be called several 
> > times
> > + * (directly and recursively while change permissions of the 
> > parent).
> > + * This is even true for children that don't inherit from the 
> > original
> > + * reopen node, so s->reopen_state is not set.
> > + *
> > + * Ignore all but the first call.
> > + */
> > +return 0;
> > +}
> > +
> > +if (s->reopen_state) {
> > +/* We already have a new file descriptor to set permissions for */
> > +assert(s->reopen_state->perm == perm);
> > +assert(s->reopen_state->shared_perm == shared);
> > +rs = s->reopen_state->opaque;
> > +s->perm_change_fd = rs->fd;
> > +} else {
> > +/* We may need a new fd if auto-read-only switches the mode */
> > +ret = raw_reconfigure_getfd(bs, bs->open_flags, &open_flags,
> > +false, errp);
> 
> Coverity says that raw_reconfigure_getfd() returns an fd in 'ret' here...
> 
> > +if (ret < 0) {
> > +return ret;
> > +} else if (ret != s->fd) {
> > +s->perm_change_fd = ret;
> > +}
> > +}
> > +
> > +/* Prepare permissions on old fd to avoid conflicts between old and 
> > new,
> > + * but keep everything locked that new will need. */
> > +ret = raw_handle_perm_lock(bs, RAW_PL_PREPARE, perm, shared, errp);
> 
> ...but this call overwrites that fd, so we might never close it.
> 
> I think the answer is that either:
>  * ret == s->fd and we'll close s->fd later
>  * or we save ret into s->perm_change_fd
> 
> and Coverity just isn't clever enough to realise that if
> ret == s->fd then we haven't lost the handle.
> 
> Is that right? If so I'll mark it as a false-positive in the UI.

raw_reconfigure_getfd() returns a file descriptor that works for the
given parameters. This can be the existing one (the ret == s->fd case) or
a new one. We only own the reference and need to store it if it's a new
one.

So yes, I think it is a false positive.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v3 12/12] docs/interop/firmware.json: Prefer -machine to if=pflash

2019-03-15 Thread Michal Privoznik

On 3/14/19 8:01 PM, Markus Armbruster wrote:

Michal Privoznik  writes:


On 3/8/19 2:14 PM, Markus Armbruster wrote:

The previous commit added a way to configure firmware with -blockdev
rather than -drive if=pflash.  Document it as the preferred way.

Signed-off-by: Markus Armbruster 
---
   docs/interop/firmware.json | 20 ++--
   1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 28f9bc1591..ff8c2ce5f2 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -212,9 +212,13 @@
   #
   # @executable: Identifies the firmware executable. The firmware
   #  executable may be shared by multiple virtual machine
-#  definitions. The corresponding QEMU command line option
-#  is "-drive
-#  
if=pflash,unit=0,readonly=on,file=@executable.@filename,format=@executable.@format".
+#  definitions. The preferred corresponding QEMU command
+#  line options are
+#  -drive 
if=none,id=pflash0,readonly=on,file=@executable.@filename,format=@executable.@format
+#  -machine pflash0=pflash0


I have a question. How is libvirt supposed to query for this? How can
it learn it can use this new, preferred command line?


You can use qom-list-properties to find out whether the machine has
property pflash0.

---> {"execute": "qom-list-properties", "arguments": {"typename": 
"pc-q35-4.0-machine"}}
<--- {"return": [... {"name": "pflash0", ...} ...]}
---> {"execute": "qom-list-properties", "arguments": {"typename": 
"isapc-machine"}}
<--- {"return": [... no such property ...]}



Ah, very well, thank you.

Michal