Re: [PATCH v1 2/3] io: Add zerocopy and errqueue

2021-09-01 Thread Leonardo Bras Soares Passos
Thanks for this feedback Peter!

I ended up reading/replying the e-mails in thread order, so I may have
been redundant
with your argument, sorry about that.

I will add my comments inline, but I will add references to the
previous mail I sent
Daniel, so please read it too.

On Tue, Aug 31, 2021 at 5:27 PM Peter Xu  wrote:
[snip]
> >
> > I can't see how we can introduce MSG_ZEROCOPY in any seemless
> > way. The buffer lifetime requirements imply need for an API
> > design that is fundamentally different for asynchronous usage,
> > with a callback to notify when the write has finished/failed.
>
> Regarding buffer reuse - it indeed has a very deep implication on the buffer
> being available and it's not obvious at all.  Just to mention that the initial
> user of this work will make sure all zero copy buffers will be guest pages 
> only
> (as it's only used in multi-fd), so they should always be there during the
> process.

Thanks for pointing that out, what's what I had in mind at the time.

>
> I think asking for a complete design still makes sense.

Agree, since I am touching QIOChannel, it makes sense to make it work for
other code that uses it too, not only our case.

>  E.g., for precopy
> before we flush device states and completes the migration, we may want to at
> least have a final ack on all the zero-copies of guest pages to guarantee they
> are flushed.
>
> IOW, we need to make sure the last piece of migration stream lands after the
> guest pages so that the dest VM will always contain the latest page data when
> dest VM starts.  So far I don't see how current code guaranteed that.
>
> In short, we may just want to at least having a way to make sure all zero
> copied buffers are finished using and they're sent after some function returns
> (e.g., qio_channel_flush()).  That may require us to do some accounting on 
> when
> we called sendmsg(MSG_ZEROCOPY), meanwhile we should need to read out the
> ee_data field within SO_EE_ORIGIN_ZEROCOPY msg when we do recvmsg() for the
> error queue and keep those information somewhere too.

Yeah, that's correct.
I haven't fully studied what the returned data represents, but I
suppose this could be
a way to fix that. In my previous reply to Daniel I pointed out a way
we may achieve
a flush behavior with poll() too, but it could be a little hacky.

>
> Some other side notes that reached my mind..
>
> The qio_channel_writev_full() may not be suitable for async operations, as the
> name "full" implies synchronous to me.  So maybe we can add a new helper for
> zero copy on the channel?
>
> We may also want a new QIOChannelFeature as QIO_CHANNEL_FEATURE_ZEROCOPY, then
> we fail qio_channel_writv_zerocopy() (or whatever name we come up with) if 
> that
> bit is not set in qio channel features.

I also suggested something like that, but I thought it could be good if we could
fall back to io_writev() if we didn't have the zerocopy feature (or
the async feature).

What do you think?

>
> Thanks,
>
> --
> Peter Xu
>

I really appreciate your suggestions, thanks Peter!

Best regards,
Leonardo




Re: [PATCH v1 2/3] io: Add zerocopy and errqueue

2021-09-01 Thread Leonardo Bras Soares Passos
Hello Daniel, thank you for the feedback!

Comments inline.

On Tue, Aug 31, 2021 at 9:57 AM Daniel P. Berrangé  wrote:
>
> On Tue, Aug 31, 2021 at 08:02:38AM -0300, Leonardo Bras wrote:
> > MSG_ZEROCOPY is a feature that enables copy avoidance in TCP/UDP socket
> > send calls. It does so by avoiding copying user data into kernel buffers.
> >
> > To make it work, three steps are needed:
> > 1 - A setsockopt() system call, enabling SO_ZEROCOPY
> > 2 - Passing down the MSG_ZEROCOPY flag for each send*() syscall
> > 3 - Process the socket's error queue, dealing with any error
>
> AFAICT, this is missing the single most critical aspect of MSG_ZEROCOPY.
>
> It is non-obvious, but setting the MSG_ZEROCOPY flag turns sendmsg()
> from a synchronous call to an asynchronous call.

You are correct. I tried to adapt io_writev() interface to use MSG_ZEROCOPY in
a somehow synchronous way, but returning errp (and sometimes closing the
channel because of it) was a poor implementation.

>
> It is forbidden to overwrite/reuse/free the buffer passed to sendmsg
> until an asynchronous completion notification has been received from
> the socket error queue. These notifications are not required to
> arrive in-order, even for a TCP stream, because the kernel hangs on
> to the buffer if a re-transmit is needed.
>
> https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
>
>   "Page pinning also changes system call semantics. It temporarily
>shares the buffer between process and network stack. Unlike with
>copying, the process cannot immediately overwrite the buffer
>after system call return without possibly modifying the data in
>flight. Kernel integrity is not affected, but a buggy program
>can possibly corrupt its own data stream."
>

By the above piece of documentation, I get there is no problem in
overwriting the buffer, but a corrupt, or newer version of the memory may
be sent instead of the original one. I am pointing this out because there
are workloads like page migration that would not be impacted, given
once the buffer is changed, it will dirty the page and it will be re-sent.

But I agree with you.
It's not a good choice to expect all the users to behave like that,
and since an interface for dealing with those errors is not provided
to the using code, there is no way of using that in other scenarios.

> AFAICT, the design added in this patch does not provide any way
> to honour these requirements around buffer lifetime.
>
> I can't see how we can introduce MSG_ZEROCOPY in any seemless
> way. The buffer lifetime requirements imply need for an API
> design that is fundamentally different for asynchronous usage,
> with a callback to notify when the write has finished/failed.
>

That is a good point.
Proposing a new optional method like io_async_writev() + an error
checking mechanism could do the job.
io_async_writev() could fall-back to io_writev() in cases where it's not
implemented.

I am not sure about the error checking yet.
Options I can see are:
1 - A callback, as you suggested, which IIUC would be provided by
code using the QIOChannel, and would only fix the reported errors,
leaving the responsibility of checking for errors to the IOChannel code.

2 - A new method, maybe io_async_errck(), which would be called
whenever the using code wants to deal with pending errors. It could
return an array/list of IOVs that need to be re-sent, for example,
and code using QIOChannel could deal with it however it wants.

[snip]

> >   * qio_channel_set_cork:
> >   * @ioc: the channel object
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index e377e7303d..a69fec7315 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -26,8 +26,10 @@
> >  #include "io/channel-watch.h"
> >  #include "trace.h"
> >  #include "qapi/clone-visitor.h"
> > +#include 
>
> That's going to fail to biuld on non-Linux

Good catch, thanks!

[snip]

> > @@ -571,6 +623,14 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >   "Unable to write to socket");
> >  return -1;
> >  }
> > +
> > +if ((flags & MSG_ZEROCOPY) && sioc->zerocopy_enabled) {
> > +sioc->errq_pending += niov;
> > +if (sioc->errq_pending > SOCKET_ERRQ_THRESH) {
> > +qio_channel_socket_errq_proc(sioc, errp);
> > +}
> > +}
>
> This silently looses any errors set from upto the final
> SOCKET_ERRQ_THRESH write() calls with MSG_ZEROCOPY set.

You are right.

>
> Also means if you have a series of writes without
> MSG_ZEROCOPY, it'll delay checking any pending
> errors.

That's expected... if there are only happening sends without MSG_ZEROCOPY,
it means the ones sent with zerocopy can wait. The problem would be
the above case.

>
> I would suggest checkig in close(), but as mentioned
> earlier, I think the design is flawed because the caller
> fundamentally needs to know about completion for every
> single write they make in order to know when

Re: [PATCH v1 1/3] io: Enable write flags for QIOChannel

2021-09-01 Thread Eric Blake
On Tue, Aug 31, 2021 at 08:02:37AM -0300, Leonardo Bras wrote:
> Some syscalls used for writting, such as sendmsg(), accept flags that
> can modify their behavior, even allowing the usage of features such as
> MSG_ZEROCOPY.
> 
> Change qio_channel_write*() interface to allow passing down flags,
> allowing a more flexible use of IOChannel.
> 
> At first, it's use is enabled for QIOChannelSocket, but can be easily
> extended to any other QIOChannel implementation.

As a followup to this patch, I wonder if we can also get performance
improvements by implementing MSG_MORE, and using that in preference to
corking/uncorking to better indicate that back-to-back short messages
may behave better when grouped together over the wire.  At least the
NBD code could make use of it (going off of my experience with the
libnbd project demonstrating a performance boost when we added
MSG_MORE support there).

> 
> Signed-off-by: Leonardo Bras 
> ---
>  chardev/char-io.c   |  2 +-
>  hw/remote/mpqemu-link.c |  2 +-
>  include/io/channel.h| 56 -
>  io/channel-buffer.c |  1 +
>  io/channel-command.c|  1 +
>  io/channel-file.c   |  1 +
>  io/channel-socket.c |  4 ++-
>  io/channel-tls.c|  1 +
>  io/channel-websock.c|  1 +
>  io/channel.c| 53 ++-
>  migration/rdma.c|  1 +
>  scsi/pr-manager-helper.c|  2 +-
>  tests/unit/test-io-channel-socket.c |  1 +
>  13 files changed, 81 insertions(+), 45 deletions(-)
> 
> diff --git a/chardev/char-io.c b/chardev/char-io.c
> index 8ced184160..4ea7b1ee2a 100644
> --- a/chardev/char-io.c
> +++ b/chardev/char-io.c
> @@ -122,7 +122,7 @@ int io_channel_send_full(QIOChannel *ioc,
>  
>  ret = qio_channel_writev_full(
>  ioc, &iov, 1,
> -fds, nfds, NULL);
> +fds, 0, nfds, NULL);

0 before nfds here...

>  if (ret == QIO_CHANNEL_ERR_BLOCK) {
>  if (offset) {
>  return offset;
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index 7e841820e5..0d13321ef0 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -69,7 +69,7 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error 
> **errp)
>  }
>  
>  if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
> -fds, nfds, errp)) {
> + fds, nfds, 0, errp)) {

Thanks for fixing the broken indentation.

...but after nfds here, so one is wrong; up to this point in a linear
review, I can't tell which was intended...

> +++ b/include/io/channel.h
> @@ -104,6 +104,7 @@ struct QIOChannelClass {
>   size_t niov,
>   int *fds,
>   size_t nfds,
> + int flags,
>   Error **errp);

...and finally I see that in general, you wanted to add the argument
after.  Looks like the change to char-io.c is buggy.

(You can use scripts/git.orderfile as a way to force your patch to
list the .h file first, to make it easier for linear review).

>  ssize_t (*io_readv)(QIOChannel *ioc,
>  const struct iovec *iov,
> @@ -260,6 +261,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
>  size_t niov,
>  int *fds,
>  size_t nfds,
> +int flags,
>  Error **errp);
>  
>  /**
> @@ -325,6 +327,7 @@ int qio_channel_readv_all(QIOChannel *ioc,
>   * @ioc: the channel object
>   * @iov: the array of memory regions to write data from
>   * @niov: the length of the @iov array
> + * @flags: optional sending flags
>   * @errp: pointer to a NULL-initialized error object
>   *
>   * Write data to the IO channel, reading it from the
> @@ -339,10 +342,14 @@ int qio_channel_readv_all(QIOChannel *ioc,
>   *
>   * Returns: 0 if all bytes were written, or -1 on error
>   */
> -int qio_channel_writev_all(QIOChannel *ioc,
> -   const struct iovec *iov,
> -   size_t niov,
> -   Error **erp);
> +int qio_channel_writev_all_flags(QIOChannel *ioc,
> + const struct iovec *iov,
> + size_t niov,
> + int flags,
> + Error **errp);

You changed the function name here, but not in the comment beforehand.

> +
> +#define qio_channel_writev_all(ioc, iov, niov, errp) \
> +qio_channel_writev_all_flags(ioc, iov, niov, 0, errp)

In most cases, you were merely adding a new function to minimize churn
to existing callers while making the old name a macro,...

> @@ -853,6 +876,7

Re: [PATCH v1 0/3] QIOChannel flags + multifd zerocopy

2021-09-01 Thread Leonardo Bras Soares Passos
Hello Peter,

On Tue, Aug 31, 2021 at 6:24 PM Peter Xu  wrote:
>
> On Tue, Aug 31, 2021 at 08:02:36AM -0300, Leonardo Bras wrote:
> > Results:
> > So far, the resource usage of __sys_sendmsg() reduced 15 times, and the
> > overall migration took 13-18% less time, based in synthetic workload.
>
> Leo,
>
> Could you share some of the details of your tests?  E.g., what's the
> configuration of your VM for testing?  What's the migration time before/after
> the patchset applied?  What is the network you're using?
>
> Thanks,
>
> --
> Peter Xu
>

Sure,
- Both receiving and sending hosts have 128GB ram and a 10Gbps network interface
  - There is a direct connection between the network interfaces.
- The guest has 100GB ram, mem-lock=on and enable-kvm.
- Before sending, I use a simple application to completely fill all
guest pages with unique values, to avoid duplicated pages and zeroed
pages.

On a single test:

Without zerocopy (qemu/master)
- Migration took 123355ms, with an average of 6912.58 Mbps
With Zerocopy:
- Migration took 108514ms, with an average of 7858.39 Mbps

This represents a throughput improvement around 13.6%.

Comparing perf recorded during default and zerocopy migration:
Without zerocopy:
- copy_user_generic_string() uses 5.4% of cpu time
- __sys_sendmsg() uses 5.19% of cpu time
With zerocopy:
- copy_user_generic_string() uses 0.02% of cpu time (~1/270 of the original)
- __sys_sendmsg() uses 0.34% of cpu time (~1/15 of the original)




Re: [PATCH v1 2/3] io: Add zerocopy and errqueue

2021-09-01 Thread Peter Xu
On Wed, Sep 01, 2021 at 09:50:56AM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 31, 2021 at 04:27:04PM -0400, Peter Xu wrote:
> > On Tue, Aug 31, 2021 at 01:57:33PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Aug 31, 2021 at 08:02:38AM -0300, Leonardo Bras wrote:
> > > > MSG_ZEROCOPY is a feature that enables copy avoidance in TCP/UDP socket
> > > > send calls. It does so by avoiding copying user data into kernel 
> > > > buffers.
> > > > 
> > > > To make it work, three steps are needed:
> > > > 1 - A setsockopt() system call, enabling SO_ZEROCOPY
> > > > 2 - Passing down the MSG_ZEROCOPY flag for each send*() syscall
> > > > 3 - Process the socket's error queue, dealing with any error
> > > 
> > > AFAICT, this is missing the single most critical aspect of MSG_ZEROCOPY.
> > > 
> > > It is non-obvious, but setting the MSG_ZEROCOPY flag turns sendmsg()
> > > from a synchronous call to an asynchronous call.
> > > 
> > > It is forbidden to overwrite/reuse/free the buffer passed to sendmsg
> > > until an asynchronous completion notification has been received from
> > > the socket error queue. These notifications are not required to
> > > arrive in-order, even for a TCP stream, because the kernel hangs on
> > > to the buffer if a re-transmit is needed.
> > > 
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > 
> > >   "Page pinning also changes system call semantics. It temporarily 
> > >shares the buffer between process and network stack. Unlike with
> > >copying, the process cannot immediately overwrite the buffer 
> > >after system call return without possibly modifying the data in 
> > >flight. Kernel integrity is not affected, but a buggy program
> > >can possibly corrupt its own data stream."
> > > 
> > > AFAICT, the design added in this patch does not provide any way
> > > to honour these requirements around buffer lifetime.
> > > 
> > > I can't see how we can introduce MSG_ZEROCOPY in any seemless
> > > way. The buffer lifetime requirements imply need for an API
> > > design that is fundamentally different for asynchronous usage,
> > > with a callback to notify when the write has finished/failed.
> > 
> > Regarding buffer reuse - it indeed has a very deep implication on the buffer
> > being available and it's not obvious at all.  Just to mention that the 
> > initial
> > user of this work will make sure all zero copy buffers will be guest pages 
> > only
> > (as it's only used in multi-fd), so they should always be there during the
> > process.
> 
> That is not the case when migration is using TLS, because the buffers
> transmitted are the ciphertext buffer, not the plaintext guest page.

Agreed.

> 
> > In short, we may just want to at least having a way to make sure all zero
> > copied buffers are finished using and they're sent after some function 
> > returns
> > (e.g., qio_channel_flush()).  That may require us to do some accounting on 
> > when
> > we called sendmsg(MSG_ZEROCOPY), meanwhile we should need to read out the
> > ee_data field within SO_EE_ORIGIN_ZEROCOPY msg when we do recvmsg() for the
> > error queue and keep those information somewhere too.
> > 
> > Some other side notes that reached my mind..
> > 
> > The qio_channel_writev_full() may not be suitable for async operations, as 
> > the
> > name "full" implies synchronous to me.  So maybe we can add a new helper for
> > zero copy on the channel?
> 
> All the APIs that exist today are fundamentally only suitable for sync
> operations. Supporting async correctly will definitely a brand new APIs
> separate from what exists today.

Yes.  What I wanted to say is maybe we can still keep the io_writev() interface
untouched, but just add a new interface at qio_channel_writev_full() level.

IOW, we could comment on io_writev() that it can be either sync or async now,
just like sendmsg() has that implication too with different flag passed in.
When calling io_writev() with different upper helpers, QIO channel could
identify what flag to pass over to io_writev().

-- 
Peter Xu




Re: [PATCH v1 2/3] io: Add zerocopy and errqueue

2021-09-01 Thread Daniel P . Berrangé
On Wed, Sep 01, 2021 at 11:52:13AM -0400, Peter Xu wrote:
> On Wed, Sep 01, 2021 at 09:50:56AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 04:27:04PM -0400, Peter Xu wrote:
> > > On Tue, Aug 31, 2021 at 01:57:33PM +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Aug 31, 2021 at 08:02:38AM -0300, Leonardo Bras wrote:
> > > > > MSG_ZEROCOPY is a feature that enables copy avoidance in TCP/UDP 
> > > > > socket
> > > > > send calls. It does so by avoiding copying user data into kernel 
> > > > > buffers.
> > > > > 
> > > > > To make it work, three steps are needed:
> > > > > 1 - A setsockopt() system call, enabling SO_ZEROCOPY
> > > > > 2 - Passing down the MSG_ZEROCOPY flag for each send*() syscall
> > > > > 3 - Process the socket's error queue, dealing with any error
> > > > 
> > > > AFAICT, this is missing the single most critical aspect of MSG_ZEROCOPY.
> > > > 
> > > > It is non-obvious, but setting the MSG_ZEROCOPY flag turns sendmsg()
> > > > from a synchronous call to an asynchronous call.
> > > > 
> > > > It is forbidden to overwrite/reuse/free the buffer passed to sendmsg
> > > > until an asynchronous completion notification has been received from
> > > > the socket error queue. These notifications are not required to
> > > > arrive in-order, even for a TCP stream, because the kernel hangs on
> > > > to the buffer if a re-transmit is needed.
> > > > 
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > 
> > > >   "Page pinning also changes system call semantics. It temporarily 
> > > >shares the buffer between process and network stack. Unlike with
> > > >copying, the process cannot immediately overwrite the buffer 
> > > >after system call return without possibly modifying the data in 
> > > >flight. Kernel integrity is not affected, but a buggy program
> > > >can possibly corrupt its own data stream."
> > > > 
> > > > AFAICT, the design added in this patch does not provide any way
> > > > to honour these requirements around buffer lifetime.
> > > > 
> > > > I can't see how we can introduce MSG_ZEROCOPY in any seemless
> > > > way. The buffer lifetime requirements imply need for an API
> > > > design that is fundamentally different for asynchronous usage,
> > > > with a callback to notify when the write has finished/failed.
> > > 
> > > Regarding buffer reuse - it indeed has a very deep implication on the 
> > > buffer
> > > being available and it's not obvious at all.  Just to mention that the 
> > > initial
> > > user of this work will make sure all zero copy buffers will be guest 
> > > pages only
> > > (as it's only used in multi-fd), so they should always be there during the
> > > process.
> > 
> > That is not the case when migration is using TLS, because the buffers
> > transmitted are the ciphertext buffer, not the plaintext guest page.
> 
> Agreed.
> 
> > 
> > > In short, we may just want to at least having a way to make sure all zero
> > > copied buffers are finished using and they're sent after some function 
> > > returns
> > > (e.g., qio_channel_flush()).  That may require us to do some accounting 
> > > on when
> > > we called sendmsg(MSG_ZEROCOPY), meanwhile we should need to read out the
> > > ee_data field within SO_EE_ORIGIN_ZEROCOPY msg when we do recvmsg() for 
> > > the
> > > error queue and keep those information somewhere too.
> > > 
> > > Some other side notes that reached my mind..
> > > 
> > > The qio_channel_writev_full() may not be suitable for async operations, 
> > > as the
> > > name "full" implies synchronous to me.  So maybe we can add a new helper 
> > > for
> > > zero copy on the channel?
> > 
> > All the APIs that exist today are fundamentally only suitable for sync
> > operations. Supporting async correctly will definitely a brand new APIs
> > separate from what exists today.
> 
> Yes.  What I wanted to say is maybe we can still keep the io_writev() 
> interface
> untouched, but just add a new interface at qio_channel_writev_full() level.
> 
> IOW, we could comment on io_writev() that it can be either sync or async now,
> just like sendmsg() has that implication too with different flag passed in.
> When calling io_writev() with different upper helpers, QIO channel could
> identify what flag to pass over to io_writev().

I don't think we should overload any of the existing methods with extra
parameters for async. Introduce a completely new set of methods exclusively
for this alternative interaction model.  I can kinda understand why they
took the approach they did with sendmsg, but I wouldn't use it as design
motivation for QEMU (except as example of what not to do).

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Peter Xu
On Wed, Sep 01, 2021 at 04:44:30PM +0100, Daniel P. Berrangé wrote:
> QEMU has mptcp support already:
> 
>   commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
>   Author: Dr. David Alan Gilbert 
>   Date:   Wed Apr 21 12:28:34 2021 +0100
> 
> sockets: Support multipath TCP
> 
> Multipath TCP allows combining multiple interfaces/routes into a single
> socket, with very little work for the user/admin.
> 
> It's enabled by 'mptcp' on most socket addresses:
> 
>./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp

Oops, I totally forgot about that, sorry!

> 
> > KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> > ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> > gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> > least we need some knob to detect whether kTLS is enabled in gnutls.
> 
> It isn't possible for gnutls to transparently enable KTLS, because
> GNUTLS doesn't get to see the actual socket directly - it'll need
> some work in QEMU to enable it.  We know MPTCP and KTLS are currently
> mutually exclusive as they both use the same kernel network hooks
> framework.

Then we may need to at least figure out whether zerocopy needs to mask out
mptcp.

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Daniel P . Berrangé
On Wed, Sep 01, 2021 at 11:35:33AM -0400, Peter Xu wrote:
> On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > thread.
> > > > > 
> > > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > > flags for qio_channel_write*().
> > > > > 
> > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > the
> > > > > other data being sent at the default copying approach.
> > > > > 
> > > > > Signed-off-by: Leonardo Bras 
> > > > > ---
> > > > >  migration/multifd-zlib.c | 7 ---
> > > > >  migration/multifd-zstd.c | 7 ---
> > > > >  migration/multifd.c  | 9 ++---
> > > > >  migration/multifd.h  | 3 ++-
> > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > 
> > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > >  }
> > > > >  
> > > > >  if (used) {
> > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > &local_err);
> > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > MSG_ZEROCOPY,
> > > > > +  
> > > > > &local_err);
> > > > 
> > > > I don't think it is valid to unconditionally enable this feature due to 
> > > > the
> > > > resource usage implications
> > > > 
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > 
> > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> > > >if the socket option was not set, the socket exceeds its optmem 
> > > >limit or the user exceeds its ulimit on locked pages."
> > > > 
> > > > The limit on locked pages is something that looks very likely to be
> > > > exceeded unless you happen to be running a QEMU config that already
> > > > implies locked memory (eg PCI assignment)
> > > 
> > > Yes it would be great to be a migration capability in parallel to 
> > > multifd. At
> > > initial phase if it's easy to be implemented on multi-fd only, we can add 
> > > a
> > > dependency between the caps.  In the future we can remove that dependency 
> > > when
> > > the code is ready to go without multifd.  Thanks,
> > 
> > Also, I'm wondering how zerocopy support interacts with kernel support
> > for kTLS and multipath-TCP, both of which we want to be able to use
> > with migration.
> 
> Copying Jason Wang for net implications between these features on kernel side
> and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).
> 
> From the safe side we may want to only enable one of them until we prove
> they'll work together I guess..

MPTCP is good when we're network limited for migration

KTLS will be good when we're CPU limited on AES for migration,
which is essentially always when TLS is used.

ZEROCOPY will be good when we're CPU limited for data copy
on migration, or to reduce the impact on other concurrent
VMs on the same CPUs.

Ultimately we woudld benefit from all of them at the same
time, if it were technically possible todo.

> Not a immediate concern as I don't really think any of them is really
> explicitly supported in qemu.

QEMU has mptcp support already:

  commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
  Author: Dr. David Alan Gilbert 
  Date:   Wed Apr 21 12:28:34 2021 +0100

sockets: Support multipath TCP

Multipath TCP allows combining multiple interfaces/routes into a single
socket, with very little work for the user/admin.

It's enabled by 'mptcp' on most socket addresses:

   ./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp

> KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> least we need some knob to detect whether kTLS is enabled in gnutls.

It isn't possible for gnutls to transparently enable KTLS, because
GNUTLS doesn't get to see the actual socket directly - it'll need
some work in QEMU to enable it.  We know MPTCP and KTLS are currently
mutually exclusive as they both use the same kernel network hooks
framework.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PULL 53/56] iotests/image-fleecing: add test-case for copy-before-write filter

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

New fleecing method becomes available: copy-before-write filter.

Actually we don't need backup job to setup image fleecing. Add test
for new recommended way of image fleecing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-34-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 50 +-
 tests/qemu-iotests/tests/image-fleecing.out | 72 +
 2 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index e210c00d28..f6318492c6 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,7 +48,7 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-def do_test(base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
 log('--- Setting up images ---')
 log('')
 
@@ -67,6 +67,7 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 
 src_node = 'source'
 tmp_node = 'temp'
+qom_path = '/machine/peripheral/sda'
 vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
 f'file.filename={base_img_path},node-name={src_node}')
 vm.add_device('virtio-scsi')
@@ -90,12 +91,22 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 'backing': src_node,
 }))
 
-# Establish COW from source to fleecing node
-log(vm.qmp('blockdev-backup',
-   job_id='fleecing',
-   device=src_node,
-   target=tmp_node,
-   sync='none'))
+# Establish CBW from source to fleecing node
+if use_cbw:
+log(vm.qmp('blockdev-add', {
+'driver': 'copy-before-write',
+'node-name': 'fl-cbw',
+'file': src_node,
+'target': tmp_node
+}))
+
+log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
+else:
+log(vm.qmp('blockdev-backup',
+   job_id='fleecing',
+   device=src_node,
+   target=tmp_node,
+   sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
@@ -124,7 +135,7 @@ def do_test(base_img_path, fleece_img_path, nbd_sock_path, 
vm):
 for p in overwrite:
 cmd = 'write -P%s %s %s' % p
 log(cmd)
-log(vm.hmp_qemu_io('/machine/peripheral/sda', cmd, qdev=True))
+log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
 log('')
 log('--- Verifying Data ---')
@@ -139,10 +150,15 @@ def do_test(base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('--- Cleanup ---')
 log('')
 
-log(vm.qmp('block-job-cancel', device='fleecing'))
-e = vm.event_wait('BLOCK_JOB_CANCELLED')
-assert e is not None
-log(e, filters=[iotests.filter_qmp_event])
+if use_cbw:
+log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
+log(vm.qmp('blockdev-del', node_name='fl-cbw'))
+else:
+log(vm.qmp('block-job-cancel', device='fleecing'))
+e = vm.event_wait('BLOCK_JOB_CANCELLED')
+assert e is not None
+log(e, filters=[iotests.filter_qmp_event])
+
 log(vm.qmp('nbd-server-stop'))
 log(vm.qmp('blockdev-del', node_name=tmp_node))
 vm.shutdown()
@@ -160,13 +176,17 @@ def do_test(base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('Done')
 
 
-def test():
+def test(use_cbw):
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
  iotests.FilePath('nbd.sock',
   base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
-do_test(base_img_path, fleece_img_path, nbd_sock_path, vm)
+do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm)
+
 
+log('=== Test backup(sync=none) based fleecing ===\n')
+test(False)
 
-test()
+log('=== Test filter based fleecing ===\n')
+test(True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out 
b/tests/qemu-iotests/tests/image-fleecing.out
index 314a1b54e9..e96d122a8b 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -1,3 +1,5 @@
+=== Test backup(sync=none) based fleecing ===
+
 --- Setting up images ---
 
 Done
@@ -65,3 +67,73 @@ read -P0xdc 32M 32k
 read -P0xcd 0x3ff 64k
 
 Done
+=== Test filter based fleecing ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sani

[PULL 52/56] iotests/image-fleecing: prepare for adding new test-case

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to add a test-case with some behavior modifications. So,
let's prepare a function to be reused.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-33-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index ec4ef5f3f6..e210c00d28 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,12 +48,7 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-with iotests.FilePath('base.img') as base_img_path, \
- iotests.FilePath('fleece.img') as fleece_img_path, \
- iotests.FilePath('nbd.sock',
-  base_dir=iotests.sock_dir) as nbd_sock_path, \
- iotests.VM() as vm:
-
+def do_test(base_img_path, fleece_img_path, nbd_sock_path, vm):
 log('--- Setting up images ---')
 log('')
 
@@ -163,3 +158,15 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 log('')
 log('Done')
+
+
+def test():
+with iotests.FilePath('base.img') as base_img_path, \
+ iotests.FilePath('fleece.img') as fleece_img_path, \
+ iotests.FilePath('nbd.sock',
+  base_dir=iotests.sock_dir) as nbd_sock_path, \
+ iotests.VM() as vm:
+do_test(base_img_path, fleece_img_path, nbd_sock_path, vm)
+
+
+test()
-- 
2.31.1




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Peter Xu
On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > thread.
> > > > 
> > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > flags for qio_channel_write*().
> > > > 
> > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > > other data being sent at the default copying approach.
> > > > 
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  migration/multifd-zlib.c | 7 ---
> > > >  migration/multifd-zstd.c | 7 ---
> > > >  migration/multifd.c  | 9 ++---
> > > >  migration/multifd.h  | 3 ++-
> > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > 
> > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > >  }
> > > >  
> > > >  if (used) {
> > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > &local_err);
> > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > MSG_ZEROCOPY,
> > > > +  &local_err);
> > > 
> > > I don't think it is valid to unconditionally enable this feature due to 
> > > the
> > > resource usage implications
> > > 
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > 
> > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> > >if the socket option was not set, the socket exceeds its optmem 
> > >limit or the user exceeds its ulimit on locked pages."
> > > 
> > > The limit on locked pages is something that looks very likely to be
> > > exceeded unless you happen to be running a QEMU config that already
> > > implies locked memory (eg PCI assignment)
> > 
> > Yes it would be great to be a migration capability in parallel to multifd. 
> > At
> > initial phase if it's easy to be implemented on multi-fd only, we can add a
> > dependency between the caps.  In the future we can remove that dependency 
> > when
> > the code is ready to go without multifd.  Thanks,
> 
> Also, I'm wondering how zerocopy support interacts with kernel support
> for kTLS and multipath-TCP, both of which we want to be able to use
> with migration.

Copying Jason Wang for net implications between these features on kernel side
and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).

>From the safe side we may want to only enable one of them until we prove
they'll work together I guess..

Not a immediate concern as I don't really think any of them is really
explicitly supported in qemu.

KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
least we need some knob to detect whether kTLS is enabled in gnutls.

-- 
Peter Xu




[PULL 51/56] iotests/image-fleecing: rename tgt_node

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Actually target of backup(sync=None) is not a final backup target:
image fleecing is intended to be used with external tool, which will
copy data from fleecing node to some real backup target.

Also, we are going to add a test case for "push backup with fleecing",
where instead of exporting fleecing node by NBD, we'll start a backup
job from fleecing node to real backup target.

To avoid confusion, let's rename temporary fleecing node now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-32-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index 961941bb27..ec4ef5f3f6 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -71,6 +71,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 src_node = 'source'
+tmp_node = 'temp'
 vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
 f'file.filename={base_img_path},node-name={src_node}')
 vm.add_device('virtio-scsi')
@@ -82,12 +83,11 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-tgt_node = 'fleeceNode'
 
-# create tgt_node backed by src_node
+# create tmp_node backed by src_node
 log(vm.qmp('blockdev-add', {
 'driver': 'qcow2',
-'node-name': tgt_node,
+'node-name': tmp_node,
 'file': {
 'driver': 'file',
 'filename': fleece_img_path,
@@ -99,19 +99,19 @@ with iotests.FilePath('base.img') as base_img_path, \
 log(vm.qmp('blockdev-backup',
job_id='fleecing',
device=src_node,
-   target=tgt_node,
+   target=tmp_node,
sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
 log('')
 
-nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
+nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
 log(vm.qmp('nbd-server-start',
{'addr': { 'type': 'unix',
   'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp('nbd-server-add', device=tgt_node))
+log(vm.qmp('nbd-server-add', device=tmp_node))
 
 log('')
 log('--- Sanity Check ---')
@@ -149,7 +149,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 assert e is not None
 log(e, filters=[iotests.filter_qmp_event])
 log(vm.qmp('nbd-server-stop'))
-log(vm.qmp('blockdev-del', node_name=tgt_node))
+log(vm.qmp('blockdev-del', node_name=tmp_node))
 vm.shutdown()
 
 log('')
-- 
2.31.1




[PULL 48/56] iotests: move 222 to tests/image-fleecing

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Give a good name to test file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-29-vsement...@virtuozzo.com>
[hreitz: Adjust .gitlab-ci.d/buildtest.yml]
Signed-off-by: Hanna Reitz 
---
 .gitlab-ci.d/buildtest.yml   | 6 +++---
 tests/qemu-iotests/{222 => tests/image-fleecing} | 0
 tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
 3 files changed, 3 insertions(+), 3 deletions(-)
 rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
 rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} (100%)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 903ee65f32..e74998efb9 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -305,11 +305,11 @@ build-tcg-disabled:
 - cd tests/qemu-iotests/
 - ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
 052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
-170 171 183 184 192 194 208 221 222 226 227 236 253 277
+170 171 183 184 192 194 208 221 226 227 236 253 277 image-fleecing
 - ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
 124 132 139 142 144 145 151 152 155 157 165 194 196 200 202
-208 209 216 218 222 227 234 246 247 248 250 254 255 257 258
-260 261 262 263 264 270 272 273 277 279
+208 209 216 218 227 234 246 247 248 250 254 255 257 258
+260 261 262 263 264 270 272 273 277 279 image-fleecing
 
 build-user:
   extends: .native_build_job_template
diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/tests/image-fleecing
similarity index 100%
rename from tests/qemu-iotests/222
rename to tests/qemu-iotests/tests/image-fleecing
diff --git a/tests/qemu-iotests/222.out 
b/tests/qemu-iotests/tests/image-fleecing.out
similarity index 100%
rename from tests/qemu-iotests/222.out
rename to tests/qemu-iotests/tests/image-fleecing.out
-- 
2.31.1




[PULL 55/56] block/export/fuse.c: fix fuse-lseek on uclibc or musl

2021-09-01 Thread Hanna Reitz
From: Fabrice Fontaine 

Include linux/fs.h to avoid the following build failure on uclibc or
musl raised since version 6.0.0:

../block/export/fuse.c: In function 'fuse_lseek':
../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this 
function)
  641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
  |   ^
../block/export/fuse.c:641:19: note: each undeclared identifier is reported 
only once for each function it appears in
../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this 
function); did you mean 'SEEK_SET'?
  641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
  |  ^
  |  SEEK_SET

Fixes:
 - 
http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137

Signed-off-by: Fabrice Fontaine 
Message-Id: <20210827220301.272887-1-fontaine.fabr...@gmail.com>
Signed-off-by: Hanna Reitz 
---
 block/export/fuse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index fc7b07d2b5..2e3bf8270b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+#ifdef __linux__
+#include 
+#endif
 
 /* Prevent overly long bounce buffer allocations */
 #define FUSE_MAX_BOUNCE_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 64 * 1024 * 1024))
-- 
2.31.1




[PULL 54/56] block/block-copy: block_copy_state_new(): drop extra arguments

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

The only caller pass copy_range and compress both false. Let's just
drop these arguments.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210824083856.17408-35-vsement...@virtuozzo.com>
Reviewed-by: Hanna Reitz 
Signed-off-by: Hanna Reitz 
---
 include/block/block-copy.h | 1 -
 block/block-copy.c | 5 ++---
 block/copy-before-write.c  | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index b8a2d63545..99370fa38b 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,6 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- bool use_copy_range, bool compress,
  Error **errp);
 
 /* Function should be called prior any actual copy request */
diff --git a/block/block-copy.c b/block/block-copy.c
index 443261e4e4..ce116318b5 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -384,8 +384,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- bool use_copy_range,
- bool compress, Error **errp)
+ Error **errp)
 {
 BlockCopyState *s;
 int64_t cluster_size;
@@ -434,7 +433,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 cluster_size),
 };
 
-block_copy_set_copy_opts(s, use_copy_range, compress);
+block_copy_set_copy_opts(s, false, false);
 
 ratelimit_init(&s->rate_limit);
 qemu_co_mutex_init(&s->lock);
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2cd68b480a..2a5e57deca 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, false, false, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
-- 
2.31.1




[PULL 49/56] iotests.py: hmp_qemu_io: support qdev

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-Id: <20210824083856.17408-30-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 4c8971d946..11276f380a 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -696,9 +696,10 @@ def resume_drive(self, drive: str) -> None:
 self.hmp(f'qemu-io {drive} "remove_break bp_{drive}"')
 
 def hmp_qemu_io(self, drive: str, cmd: str,
-use_log: bool = False) -> QMPMessage:
+use_log: bool = False, qdev: bool = False) -> QMPMessage:
 """Write to a given drive using an HMP command"""
-return self.hmp(f'qemu-io {drive} "{cmd}"', use_log=use_log)
+d = '-d ' if qdev else ''
+return self.hmp(f'qemu-io {d}{drive} "{cmd}"', use_log=use_log)
 
 def flatten_qmp_object(self, obj, output=None, basestr=''):
 if output is None:
-- 
2.31.1




[PULL 56/56] block/file-win32: add reopen handlers

2021-09-01 Thread Hanna Reitz
From: Viktor Prutyanov 

Make 'qemu-img commit' work on Windows.

Command 'commit' requires reopening backing file in RW mode. So,
add reopen prepare/commit/abort handlers and change dwShareMode
for CreateFile call in order to allow further read/write reopening.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418

Suggested-by: Hanna Reitz 
Signed-off-by: Viktor Prutyanov 
Tested-by: Helge Konetzka 
Message-Id: <20210825173625.19415-1-viktor.prutya...@phystech.edu>
Signed-off-by: Hanna Reitz 
---
 block/file-win32.c | 101 -
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 2642088bd6..b97c58d642 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -58,6 +58,10 @@ typedef struct BDRVRawState {
 QEMUWin32AIOState *aio;
 } BDRVRawState;
 
+typedef struct BDRVRawReopenState {
+HANDLE hfile;
+} BDRVRawReopenState;
+
 /*
  * Read/writes the data to/from a given linear buffer.
  *
@@ -392,7 +396,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 s->hfile = CreateFile(filename, access_flags,
-  FILE_SHARE_READ, NULL,
+  FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
   OPEN_EXISTING, overlapped, NULL);
 if (s->hfile == INVALID_HANDLE_VALUE) {
 int err = GetLastError();
@@ -634,6 +638,97 @@ static int coroutine_fn raw_co_create_opts(BlockDriver 
*drv,
 return raw_co_create(&options, errp);
 }
 
+static int raw_reopen_prepare(BDRVReopenState *state,
+  BlockReopenQueue *queue, Error **errp)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs;
+int access_flags;
+DWORD overlapped;
+int ret = 0;
+
+if (s->type != FTYPE_FILE) {
+error_setg(errp, "Can only reopen files");
+return -EINVAL;
+}
+
+rs = g_new0(BDRVRawReopenState, 1);
+
+/*
+ * We do not support changing any options (only flags). By leaving
+ * all options in state->options, we tell the generic reopen code
+ * that we do not support changing any of them, so it will verify
+ * that their values did not change.
+ */
+
+raw_parse_flags(state->flags, s->aio != NULL, &access_flags, &overlapped);
+rs->hfile = CreateFile(state->bs->filename, access_flags,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+   OPEN_EXISTING, overlapped, NULL);
+
+if (rs->hfile == INVALID_HANDLE_VALUE) {
+int err = GetLastError();
+
+error_setg_win32(errp, err, "Could not reopen '%s'",
+ state->bs->filename);
+if (err == ERROR_ACCESS_DENIED) {
+ret = -EACCES;
+} else {
+ret = -EINVAL;
+}
+goto fail;
+}
+
+if (s->aio) {
+ret = win32_aio_attach(s->aio, rs->hfile);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not enable AIO");
+CloseHandle(rs->hfile);
+goto fail;
+}
+}
+
+state->opaque = rs;
+
+return 0;
+
+fail:
+g_free(rs);
+state->opaque = NULL;
+
+return ret;
+}
+
+static void raw_reopen_commit(BDRVReopenState *state)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs = state->opaque;
+
+assert(rs != NULL);
+
+CloseHandle(s->hfile);
+s->hfile = rs->hfile;
+
+g_free(rs);
+state->opaque = NULL;
+}
+
+static void raw_reopen_abort(BDRVReopenState *state)
+{
+BDRVRawReopenState *rs = state->opaque;
+
+if (!rs) {
+return;
+}
+
+if (rs->hfile != INVALID_HANDLE_VALUE) {
+CloseHandle(rs->hfile);
+}
+
+g_free(rs);
+state->opaque = NULL;
+}
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -659,6 +754,10 @@ BlockDriver bdrv_file = {
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 
+.bdrv_reopen_prepare = raw_reopen_prepare,
+.bdrv_reopen_commit  = raw_reopen_commit,
+.bdrv_reopen_abort   = raw_reopen_abort,
+
 .bdrv_aio_preadv= raw_aio_preadv,
 .bdrv_aio_pwritev   = raw_aio_pwritev,
 .bdrv_aio_flush = raw_aio_flush,
-- 
2.31.1




[PULL 43/56] python/qemu/machine.py: refactor _qemu_args()

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

 - use shorter construction
 - don't create new dict if not needed
 - drop extra unpacking key-val arguments
 - drop extra default values

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-Id: <20210824083856.17408-24-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 8b935813e9..3fde73cf10 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -570,14 +570,12 @@ def _qmp(self) -> QEMUMonitorProtocol:
 return self._qmp_connection
 
 @classmethod
-def _qmp_args(cls, _conv_keys: bool = True, **args: Any) -> Dict[str, Any]:
-qmp_args = dict()
-for key, value in args.items():
-if _conv_keys:
-qmp_args[key.replace('_', '-')] = value
-else:
-qmp_args[key] = value
-return qmp_args
+def _qmp_args(cls, conv_keys: bool,
+  args: Dict[str, Any]) -> Dict[str, object]:
+if conv_keys:
+return {k.replace('_', '-'): v for k, v in args.items()}
+
+return args
 
 def qmp(self, cmd: str,
 conv_keys: bool = True,
@@ -585,7 +583,7 @@ def qmp(self, cmd: str,
 """
 Invoke a QMP command and return the response dict
 """
-qmp_args = self._qmp_args(conv_keys, **args)
+qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.cmd(cmd, args=qmp_args)
 
 def command(self, cmd: str,
@@ -596,7 +594,7 @@ def command(self, cmd: str,
 On success return the response dict.
 On failure raise an exception.
 """
-qmp_args = self._qmp_args(conv_keys, **args)
+qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.command(cmd, **qmp_args)
 
 def get_qmp_event(self, wait: bool = False) -> Optional[QMPMessage]:
-- 
2.31.1




[PULL 38/56] block/copy-before-write: cbw_init(): use options

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

One more step closer to .bdrv_open(): use options instead of plain
arguments. Move to bdrv_open_child() calls, native for drive open
handlers.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
Message-Id: <20210824083856.17408-19-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 1e7180760a..1cefcade78 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,25 +144,20 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
-static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
-BlockDriverState *target, Error **errp)
+static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
-bdrv_ref(target);
-s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!s->target) {
-error_prepend(errp, "Cannot attach target child: ");
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+   false, errp);
+if (!bs->file) {
 return -EINVAL;
 }
 
-bdrv_ref(source);
-bs->file = bdrv_attach_child(bs, source, "file", &child_of_bds,
- BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
- errp);
-if (!bs->file) {
-error_prepend(errp, "Cannot attach file child: ");
+s->target = bdrv_open_child(NULL, options, "target", bs, &child_of_bds,
+BDRV_CHILD_DATA, false, errp);
+if (!s->target) {
 return -EINVAL;
 }
 
@@ -209,6 +204,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
+QDict *opts;
 
 assert(source->total_sectors == target->total_sectors);
 
@@ -220,7 +216,12 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 }
 state = top->opaque;
 
-ret = cbw_init(top, source, target, errp);
+opts = qdict_new();
+qdict_put_str(opts, "file", bdrv_get_node_name(source));
+qdict_put_str(opts, "target", bdrv_get_node_name(target));
+
+ret = cbw_init(top, opts, errp);
+qobject_unref(opts);
 if (ret < 0) {
 goto fail;
 }
-- 
2.31.1




[PULL 45/56] python:QEMUMachine: template typing for self returning methods

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

mypy thinks that return value of these methods in subclusses is
QEMUMachine, which is wrong. So, make typing smarter.

Suggested-by: John Snow 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210824083856.17408-26-vsement...@virtuozzo.com>
Reviewed-by: Hanna Reitz 
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 6ec18570d9..a7081b1845 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -36,6 +36,7 @@
 Sequence,
 Tuple,
 Type,
+TypeVar,
 )
 
 from qemu.qmp import (  # pylint: disable=import-error
@@ -73,6 +74,9 @@ class AbnormalShutdown(QEMUMachineError):
 """
 
 
+_T = TypeVar('_T', bound='QEMUMachine')
+
+
 class QEMUMachine:
 """
 A QEMU VM.
@@ -169,7 +173,7 @@ def __init__(self,
 self._remove_files: List[str] = []
 self._user_killed = False
 
-def __enter__(self) -> 'QEMUMachine':
+def __enter__(self: _T) -> _T:
 return self
 
 def __exit__(self,
@@ -185,8 +189,8 @@ def add_monitor_null(self) -> None:
 self._args.append('-monitor')
 self._args.append('null')
 
-def add_fd(self, fd: int, fdset: int,
-   opaque: str, opts: str = '') -> 'QEMUMachine':
+def add_fd(self: _T, fd: int, fdset: int,
+   opaque: str, opts: str = '') -> _T:
 """
 Pass a file descriptor to the VM
 """
-- 
2.31.1




[PULL 44/56] python/qemu/machine: QEMUMachine: improve qmp() method

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We often call qmp() with unpacking dict, like qmp('foo', **{...}).
mypy don't really like it, it thinks that passed unpacked dict is a
positional argument and complains that it type should be bool (because
second argument of qmp() is conv_keys: bool).

Allow passing dict directly, simplifying interface, and giving a way to
satisfy mypy.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Message-Id: <20210824083856.17408-25-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 3fde73cf10..6ec18570d9 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -578,11 +578,21 @@ def _qmp_args(cls, conv_keys: bool,
 return args
 
 def qmp(self, cmd: str,
-conv_keys: bool = True,
+args_dict: Optional[Dict[str, object]] = None,
+conv_keys: Optional[bool] = None,
 **args: Any) -> QMPMessage:
 """
 Invoke a QMP command and return the response dict
 """
+if args_dict is not None:
+assert not args
+assert conv_keys is None
+args = args_dict
+conv_keys = False
+
+if conv_keys is None:
+conv_keys = True
+
 qmp_args = self._qmp_args(conv_keys, args)
 return self._qmp.cmd(cmd, args=qmp_args)
 
-- 
2.31.1




[PULL 30/56] block/copy-before-write: relax permission requirements when no parents

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to publish copy-before-write filter. So, user should be
able to create it with blockdev-add first, specifying both filtered and
target children. And then do blockdev-reopen, to actually insert the
filter where needed.

Currently, filter unshares write permission unconditionally on source
node. It's good, but it will not allow to do blockdev-add. So, let's
relax restrictions when filter doesn't have any parent.

Test output is modified, as now permission conflict happens only when
job creates a blk parent for filter node.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-11-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c  | 8 +---
 tests/qemu-iotests/283.out | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a7996d54db..2a51cc64e4 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -142,10 +142,12 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 bdrv_default_perms(bs, c, role, reopen_queue,
perm, shared, nperm, nshared);
 
-if (perm & BLK_PERM_WRITE) {
-*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+if (!QLIST_EMPTY(&bs->parents)) {
+if (perm & BLK_PERM_WRITE) {
+*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+}
+*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
-*nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
 }
 
diff --git a/tests/qemu-iotests/283.out b/tests/qemu-iotests/283.out
index f2b7219632..5bb75952ef 100644
--- a/tests/qemu-iotests/283.out
+++ b/tests/qemu-iotests/283.out
@@ -5,7 +5,7 @@
 {"execute": "blockdev-add", "arguments": {"driver": "blkdebug", "image": 
"base", "node-name": "other", "take-child-perms": ["write"]}}
 {"return": {}}
 {"execute": "blockdev-backup", "arguments": {"device": "source", "sync": 
"full", "target": "target"}}
-{"error": {"class": "GenericError", "desc": "Cannot append copy-before-write 
filter: Permission conflict on node 'base': permissions 'write' are both 
required by node 'other' (uses node 'base' as 'image' child) and unshared by 
node 'source' (uses node 'base' as 'image' child)."}}
+{"error": {"class": "GenericError", "desc": "Permission conflict on node 
'base': permissions 'write' are both required by node 'other' (uses node 'base' 
as 'image' child) and unshared by node 'source' (uses node 'base' as 'image' 
child)."}}
 
 === copy-before-write filter should be gone after job-finalize ===
 
-- 
2.31.1




[PULL 29/56] block/backup: move cluster size calculation to block-copy

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

The main consumer of cluster-size is block-copy. Let's calculate it
here instead of passing through backup-top.

We are going to publish copy-before-write filter soon, so it will be
created through options. But we don't want for now to make explicit
option for cluster-size, let's continue to calculate it automatically.
So, now is the time to get rid of cluster_size argument for
bdrv_cbw_append().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-10-vsement...@virtuozzo.com>
[hreitz: Add qemu/error-report.h include to block/block-copy.c]
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.h  |  1 -
 include/block/block-copy.h |  5 +--
 block/backup.c | 62 ++
 block/block-copy.c | 52 +++-
 block/copy-before-write.c  | 10 +++---
 5 files changed, 67 insertions(+), 63 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 538aab8bdb..b386fd8f01 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -32,7 +32,6 @@
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  uint64_t cluster_size,
   bool compress,
   BlockCopyState **bcs,
   Error **errp);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index dca6c4ce36..b8a2d63545 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,8 +25,8 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- int64_t cluster_size, bool use_copy_range,
- bool compress, Error **errp);
+ bool use_copy_range, bool compress,
+ Error **errp);
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
@@ -91,6 +91,7 @@ void block_copy_kick(BlockCopyCallState *call_state);
 void block_copy_call_cancel(BlockCopyCallState *call_state);
 
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
+int64_t block_copy_cluster_size(BlockCopyState *s);
 void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
 
 #endif /* BLOCK_COPY_H */
diff --git a/block/backup.c b/block/backup.c
index b31fd99ab3..83516297cb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -29,8 +29,6 @@
 
 #include "block/copy-before-write.h"
 
-#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
-
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockDriverState *cbw;
@@ -354,43 +352,6 @@ static const BlockJobDriver backup_job_driver = {
 .set_speed = backup_set_speed,
 };
 
-static int64_t backup_calculate_cluster_size(BlockDriverState *target,
- Error **errp)
-{
-int ret;
-BlockDriverInfo bdi;
-bool target_does_cow = bdrv_backing_chain_next(target);
-
-/*
- * If there is no backing file on the target, we cannot rely on COW if our
- * backup cluster size is smaller than the target cluster size. Even for
- * targets with a backing file, try to avoid COW if possible.
- */
-ret = bdrv_get_info(target, &bdi);
-if (ret == -ENOTSUP && !target_does_cow) {
-/* Cluster size is not defined */
-warn_report("The target block device doesn't provide "
-"information about the block size and it doesn't have a "
-"backing file. The default block size of %u bytes is "
-"used. If the actual block size of the target exceeds "
-"this default, the backup may be unusable",
-BACKUP_CLUSTER_SIZE_DEFAULT);
-return BACKUP_CLUSTER_SIZE_DEFAULT;
-} else if (ret < 0 && !target_does_cow) {
-error_setg_errno(errp, -ret,
-"Couldn't determine the cluster size of the target image, "
-"which has no backing file");
-error_append_hint(errp,
-"Aborting, since this may create an unusable destination image\n");
-return ret;
-} else if (ret < 0 && target_does_cow) {
-/* Not fatal; just trudge on ahead. */
-return BACKUP_CLUSTER_SIZE_DEFAULT;
-}
-
-return MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
-}
-
 BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -448,11 +409,6 @@ BlockJob *backup_job_create(const char

[PULL 41/56] block/copy-before-write: make public block driver

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Finally, copy-before-write gets own .bdrv_open and .bdrv_close
handlers, block_init() call and becomes available through bdrv_open().

To achieve this:

 - cbw_init gets unused flags argument and becomes cbw_open
 - block_copy_state_free() call moved to new cbw_close()
 - in bdrv_cbw_append:
   - options are completed with driver and node-name, and we can simply
 use bdrv_insert_node() to do both open and drained replacing
 - in bdrv_cbw_drop:
   - cbw_close() is now responsible for freeing s->bcs, so don't do it
 here

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-22-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 60 ++-
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2efe098aae..2cd68b480a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,7 +144,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 }
 
-static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
+static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 BdrvDirtyBitmap *copy_bitmap;
@@ -181,10 +182,21 @@ static int cbw_init(BlockDriverState *bs, QDict *options, 
Error **errp)
 return 0;
 }
 
+static void cbw_close(BlockDriverState *bs)
+{
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
+block_copy_state_free(s->bcs);
+s->bcs = NULL;
+}
+
 BlockDriver bdrv_cbw_filter = {
 .format_name = "copy-before-write",
 .instance_size = sizeof(BDRVCopyBeforeWriteState),
 
+.bdrv_open  = cbw_open,
+.bdrv_close = cbw_close,
+
 .bdrv_co_preadv = cbw_co_preadv,
 .bdrv_co_pwritev= cbw_co_pwritev,
 .bdrv_co_pwrite_zeroes  = cbw_co_pwrite_zeroes,
@@ -205,56 +217,40 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
   Error **errp)
 {
 ERRP_GUARD();
-int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
 QDict *opts;
 
 assert(source->total_sectors == target->total_sectors);
 
-top = bdrv_new_open_driver(&bdrv_cbw_filter, filter_node_name,
-   BDRV_O_RDWR, errp);
-if (!top) {
-error_prepend(errp, "Cannot open driver: ");
-return NULL;
-}
-state = top->opaque;
-
 opts = qdict_new();
+qdict_put_str(opts, "driver", "copy-before-write");
+if (filter_node_name) {
+qdict_put_str(opts, "node-name", filter_node_name);
+}
 qdict_put_str(opts, "file", bdrv_get_node_name(source));
 qdict_put_str(opts, "target", bdrv_get_node_name(target));
 
-ret = cbw_init(top, opts, errp);
-qobject_unref(opts);
-if (ret < 0) {
-goto fail;
-}
-
-bdrv_drained_begin(source);
-ret = bdrv_replace_node(source, top, errp);
-bdrv_drained_end(source);
-if (ret < 0) {
-error_prepend(errp, "Cannot append copy-before-write filter: ");
-goto fail;
+top = bdrv_insert_node(source, opts, BDRV_O_RDWR, errp);
+if (!top) {
+return NULL;
 }
 
+state = top->opaque;
 *bcs = state->bcs;
 
 return top;
-
-fail:
-block_copy_state_free(state->bcs);
-bdrv_unref(top);
-return NULL;
 }
 
 void bdrv_cbw_drop(BlockDriverState *bs)
 {
-BDRVCopyBeforeWriteState *s = bs->opaque;
-
 bdrv_drop_filter(bs, &error_abort);
-
-block_copy_state_free(s->bcs);
-
 bdrv_unref(bs);
 }
+
+static void cbw_init(void)
+{
+bdrv_register(&bdrv_cbw_filter);
+}
+
+block_init(cbw_init);
-- 
2.31.1




[PULL 39/56] block/copy-before-write: initialize block-copy bitmap

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to publish copy-before-write filter to be used in separate
of backup. Future step would support bitmap for the filter. But let's
start from full set bitmap.

We have to modify backup, as bitmap is first initialized by
copy-before-write filter, and then backup modifies it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-20-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/backup.c| 16 +++-
 block/copy-before-write.c |  4 
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4869f1e5da..687d2882bc 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -233,18 +233,16 @@ static void backup_init_bcs_bitmap(BackupBlockJob *job)
 BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
 
 if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
+bdrv_clear_dirty_bitmap(bcs_bitmap, NULL);
 ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
NULL, true);
 assert(ret);
-} else {
-if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-/*
- * We can't hog the coroutine to initialize this thoroughly.
- * Set a flag and resume work when we are able to yield safely.
- */
-block_copy_set_skip_unallocated(job->bcs, true);
-}
-bdrv_set_dirty_bitmap(bcs_bitmap, 0, job->len);
+} else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+/*
+ * We can't hog the coroutine to initialize this thoroughly.
+ * Set a flag and resume work when we are able to yield safely.
+ */
+block_copy_set_skip_unallocated(job->bcs, true);
 }
 
 estimate = bdrv_get_dirty_count(bcs_bitmap);
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 1cefcade78..2efe098aae 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -147,6 +147,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 static int cbw_init(BlockDriverState *bs, QDict *options, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
+BdrvDirtyBitmap *copy_bitmap;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -174,6 +175,9 @@ static int cbw_init(BlockDriverState *bs, QDict *options, 
Error **errp)
 return -EINVAL;
 }
 
+copy_bitmap = block_copy_dirty_bitmap(s->bcs);
+bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+
 return 0;
 }
 
-- 
2.31.1




[PULL 27/56] block/block-copy: introduce block_copy_set_copy_opts()

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We'll need a possibility to set compress and use_copy_range options
after initialization of the state. So make corresponding part of
block_copy_state_new() separate and public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210824083856.17408-8-vsement...@virtuozzo.com>
Reviewed-by: Hanna Reitz 
Signed-off-by: Hanna Reitz 
---
 include/block/block-copy.h |  3 +++
 block/block-copy.c | 49 ++
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 734389d32a..dca6c4ce36 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -28,6 +28,9 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
  bool compress, Error **errp);
 
+/* Function should be called prior any actual copy request */
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress);
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
diff --git a/block/block-copy.c b/block/block-copy.c
index 58b4345a5a..e83870ff81 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -315,6 +315,33 @@ static uint32_t block_copy_max_transfer(BdrvChild *source, 
BdrvChild *target)
  target->bs->bl.max_transfer));
 }
 
+void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
+  bool compress)
+{
+/* Keep BDRV_REQ_SERIALISING set (or not set) in block_copy_state_new() */
+s->write_flags = (s->write_flags & BDRV_REQ_SERIALISING) |
+(compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
+if (s->max_transfer < s->cluster_size) {
+/*
+ * copy_range does not respect max_transfer. We don't want to bother
+ * with requests smaller than block-copy cluster size, so fallback to
+ * buffered copying (read and write respect max_transfer on their
+ * behalf).
+ */
+s->method = COPY_READ_WRITE_CLUSTER;
+} else if (compress) {
+/* Compression supports only cluster-size writes and no copy-range. */
+s->method = COPY_READ_WRITE_CLUSTER;
+} else {
+/*
+ * If copy range enabled, start with COPY_RANGE_SMALL, until first
+ * successful copy_range (look at block_copy_do_copy).
+ */
+s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
+}
+}
+
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
  bool compress, Error **errp)
@@ -353,32 +380,14 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 .copy_bitmap = copy_bitmap,
 .cluster_size = cluster_size,
 .len = bdrv_dirty_bitmap_size(copy_bitmap),
-.write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
-(compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
+.write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0),
 .mem = shres_create(BLOCK_COPY_MAX_MEM),
 .max_transfer = QEMU_ALIGN_DOWN(
 block_copy_max_transfer(source, target),
 cluster_size),
 };
 
-if (s->max_transfer < cluster_size) {
-/*
- * copy_range does not respect max_transfer. We don't want to bother
- * with requests smaller than block-copy cluster size, so fallback to
- * buffered copying (read and write respect max_transfer on their
- * behalf).
- */
-s->method = COPY_READ_WRITE_CLUSTER;
-} else if (compress) {
-/* Compression supports only cluster-size writes and no copy-range. */
-s->method = COPY_READ_WRITE_CLUSTER;
-} else {
-/*
- * If copy range enabled, start with COPY_RANGE_SMALL, until first
- * successful copy_range (look at block_copy_do_copy).
- */
-s->method = use_copy_range ? COPY_RANGE_SMALL : COPY_READ_WRITE;
-}
+block_copy_set_copy_opts(s, use_copy_range, compress);
 
 ratelimit_init(&s->rate_limit);
 qemu_co_mutex_init(&s->lock);
-- 
2.31.1




[PULL 26/56] block-copy: move detecting fleecing scheme to block-copy

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We want to simplify initialization interface of copy-before-write
filter as we are going to make it public. So, let's detect fleecing
scheme exactly in block-copy code, to not pass this information through
extra levels.

Why not just set BDRV_REQ_SERIALISING unconditionally: because we are
going to implement new more efficient fleecing scheme which will not
rely on backing feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
Message-Id: <20210824083856.17408-7-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.h  |  2 +-
 include/block/block-copy.h |  3 +--
 block/backup.c | 21 +
 block/block-copy.c | 24 +---
 block/copy-before-write.c  |  4 ++--
 5 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 5977b7aa31..f37e2249ae 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -34,7 +34,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   const char *filter_node_name,
   uint64_t cluster_size,
   BackupPerf *perf,
-  BdrvRequestFlags write_flags,
+  bool compress,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 5c8278895c..734389d32a 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -26,8 +26,7 @@ typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
- BdrvRequestFlags write_flags,
- Error **errp);
+ bool compress, Error **errp);
 
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
diff --git a/block/backup.c b/block/backup.c
index ac91821b08..84f9a5f02c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -407,7 +407,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 int64_t len, target_len;
 BackupBlockJob *job = NULL;
 int64_t cluster_size;
-BdrvRequestFlags write_flags;
 BlockDriverState *cbw = NULL;
 BlockCopyState *bcs = NULL;
 
@@ -504,26 +503,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-/*
- * If source is in backing chain of target assume that target is going to 
be
- * used for "image fleecing", i.e. it should represent a kind of snapshot 
of
- * source at backup-start point in time. And target is going to be read by
- * somebody (for example, used as NBD export) during backup job.
- *
- * In this case, we need to add BDRV_REQ_SERIALISING write flag to avoid
- * intersection of backup writes and third party reads from target,
- * otherwise reading from target we may occasionally read already updated 
by
- * guest data.
- *
- * For more information see commit f8d59dfb40bb and test
- * tests/qemu-iotests/222
- */
-write_flags = (bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) 
|
-  (compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
-
 cbw = bdrv_cbw_append(bs, target, filter_node_name,
-cluster_size, perf,
-write_flags, &bcs, errp);
+  cluster_size, perf, compress, &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/block-copy.c b/block/block-copy.c
index 0becad52da..58b4345a5a 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -317,10 +317,11 @@ static uint32_t block_copy_max_transfer(BdrvChild 
*source, BdrvChild *target)
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
- BdrvRequestFlags write_flags, Error 
**errp)
+ bool compress, Error **errp)
 {
 BlockCopyState *s;
 BdrvDirtyBitmap *copy_bitmap;
+bool is_fleecing;
 
 copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
errp);
@@ -329,6 +330,22 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 }
 bdrv_disable_dirty_bitmap(copy_bitmap);
 
+/*
+ * If source is in backing chain of target assume that target is going to 
be
+ * used for "image fleecing", i.e. it should represent a kind of snapshot 
of
+ * source 

[PULL 37/56] block/copy-before-write: bdrv_cbw_append(): drop unused compress arg

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
Message-Id: <20210824083856.17408-18-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.h | 1 -
 block/backup.c| 2 +-
 block/copy-before-write.c | 7 +++
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index b386fd8f01..51847e711a 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -32,7 +32,6 @@
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  bool compress,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/block/backup.c b/block/backup.c
index 83516297cb..4869f1e5da 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -452,7 +452,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-cbw = bdrv_cbw_append(bs, target, filter_node_name, false, &bcs, errp);
+cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4858dcf8ff..1e7180760a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -145,7 +145,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 
 static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
-BlockDriverState *target, bool compress, Error **errp)
+BlockDriverState *target, Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
@@ -173,7 +173,7 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, false, false, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
@@ -202,7 +202,6 @@ BlockDriver bdrv_cbw_filter = {
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
-  bool compress,
   BlockCopyState **bcs,
   Error **errp)
 {
@@ -221,7 +220,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 }
 state = top->opaque;
 
-ret = cbw_init(top, source, target, compress, errp);
+ret = cbw_init(top, source, target, errp);
 if (ret < 0) {
 goto fail;
 }
-- 
2.31.1




[PULL 36/56] block/copy-before-write: cbw_init(): use file child after attaching

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

In the next commit we'll get rid of source argument of cbw_init().
Prepare to it now, to make next commit simpler: move the code block
that uses source below attaching the child and use bs->file->bs instead
of source variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-17-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index d7f1833efa..4858dcf8ff 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -149,13 +149,6 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
-bs->total_sectors = source->total_sectors;
-bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
-(BDRV_REQ_FUA & source->supported_write_flags);
-bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
-((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
- source->supported_zero_flags);
-
 bdrv_ref(target);
 s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
@@ -173,6 +166,13 @@ static int cbw_init(BlockDriverState *bs, BlockDriverState 
*source,
 return -EINVAL;
 }
 
+bs->total_sectors = bs->file->bs->total_sectors;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+ bs->file->bs->supported_zero_flags);
+
 s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
-- 
2.31.1




[PULL 25/56] block: rename backup-top to copy-before-write

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to convert backup_top to full featured public filter,
which can be used in separate of backup job. Start from renaming from
"how it used" to "what it does".

While updating comments in 283 iotest, drop and rephrase also things
about ".active", as this field is now dropped, and filter doesn't have
"inactive" mode.

Note that this change may be considered as incompatible interface
change, as backup-top filter format name was visible through
query-block and query-named-block-nodes.

Still, consider the following reasoning:

1. backup-top was never documented, so if someone depends on format
   name (for driver that can't be used other than it is automatically
   inserted on backup job start), it's a kind of "undocumented feature
   use". So I think we are free to change it.

2. There is a hope, that there is no such users: it's a lot more native
   to give a good node-name to backup-top filter if need to operate
   with it somehow, and don't touch format name.

3. Another "incompatible" change in further commit would be moving
   copy-before-write filter from using backing child to file child. And
   this is even more reasonable than renaming: for now all public
   filters are file-child based.

So, it's a risky change, but risk seems small and good interface worth
it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-6-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/{backup-top.h => copy-before-write.h} |  28 +++---
 block/backup.c  |  22 ++---
 block/{backup-top.c => copy-before-write.c} | 100 ++--
 MAINTAINERS |   4 +-
 block/meson.build   |   2 +-
 tests/qemu-iotests/283  |  35 +++
 tests/qemu-iotests/283.out  |   4 +-
 7 files changed, 95 insertions(+), 100 deletions(-)
 rename block/{backup-top.h => copy-before-write.h} (56%)
 rename block/{backup-top.c => copy-before-write.c} (62%)

diff --git a/block/backup-top.h b/block/copy-before-write.h
similarity index 56%
rename from block/backup-top.h
rename to block/copy-before-write.h
index b28b0031c4..5977b7aa31 100644
--- a/block/backup-top.h
+++ b/block/copy-before-write.h
@@ -1,10 +1,10 @@
 /*
- * backup-top filter driver
+ * copy-before-write filter driver
  *
  * The driver performs Copy-Before-Write (CBW) operation: it is injected above
  * some node, and before each write it copies _old_ data to the target node.
  *
- * Copyright (c) 2018-2019 Virtuozzo International GmbH.
+ * Copyright (c) 2018-2021 Virtuozzo International GmbH.
  *
  * Author:
  *  Sementsov-Ogievskiy Vladimir 
@@ -23,20 +23,20 @@
  * along with this program. If not, see .
  */
 
-#ifndef BACKUP_TOP_H
-#define BACKUP_TOP_H
+#ifndef COPY_BEFORE_WRITE_H
+#define COPY_BEFORE_WRITE_H
 
 #include "block/block_int.h"
 #include "block/block-copy.h"
 
-BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
- BlockDriverState *target,
- const char *filter_node_name,
- uint64_t cluster_size,
- BackupPerf *perf,
- BdrvRequestFlags write_flags,
- BlockCopyState **bcs,
- Error **errp);
-void bdrv_backup_top_drop(BlockDriverState *bs);
+BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
+  BlockDriverState *target,
+  const char *filter_node_name,
+  uint64_t cluster_size,
+  BackupPerf *perf,
+  BdrvRequestFlags write_flags,
+  BlockCopyState **bcs,
+  Error **errp);
+void bdrv_cbw_drop(BlockDriverState *bs);
 
-#endif /* BACKUP_TOP_H */
+#endif /* COPY_BEFORE_WRITE_H */
diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..ac91821b08 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -27,13 +27,13 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 
-#include "block/backup-top.h"
+#include "block/copy-before-write.h"
 
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 
 typedef struct BackupBlockJob {
 BlockJob common;
-BlockDriverState *backup_top;
+BlockDriverState *cbw;
 BlockDriverState *source_bs;
 BlockDriverState *target_bs;
 
@@ -104,7 +104,7 @@ static void backup_clean(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 block_job_remove_all_bdrv(&s->common);
-bdrv_backup_top_drop(s->backup_top);
+bdrv_cbw_drop(s->cbw);
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -408,7 +408,7 @@ Bloc

[PULL 24/56] qdev: allow setting drive property for realized device

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We need an ability to insert filters above top block node, attached to
block device. It can't be achieved with blockdev-reopen command. So, we
want do it with help of qom-set.

Intended usage:

Assume there is a node A that is attached to some guest device.

1. blockdev-add to create a filter node B that has A as its child.

2. qom-set to change the node attached to the guest device’s
   BlockBackend from A to B.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-5-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 hw/core/qdev-properties-system.c | 43 +++-
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 2760c21f11..e71f5d64d1 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -36,11 +36,11 @@
 
 static bool check_prop_still_unset(Object *obj, const char *name,
const void *old_val, const char *new_val,
-   Error **errp)
+   bool allow_override, Error **errp)
 {
 const GlobalProperty *prop = qdev_find_global_prop(obj, name);
 
-if (!old_val) {
+if (!old_val || (!prop && allow_override)) {
 return true;
 }
 
@@ -93,16 +93,34 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 BlockBackend *blk;
 bool blk_created = false;
 int ret;
+BlockDriverState *bs;
+AioContext *ctx;
 
 if (!visit_type_str(v, name, &str, errp)) {
 return;
 }
 
-/*
- * TODO Should this really be an error?  If no, the old value
- * needs to be released before we store the new one.
- */
-if (!check_prop_still_unset(obj, name, *ptr, str, errp)) {
+if (!check_prop_still_unset(obj, name, *ptr, str, true, errp)) {
+return;
+}
+
+if (*ptr) {
+/* BlockBackend alread exists. So, we want to change attached node */
+blk = *ptr;
+ctx = blk_get_aio_context(blk);
+bs = bdrv_lookup_bs(NULL, str, errp);
+if (!bs) {
+return;
+}
+
+if (ctx != bdrv_get_aio_context(bs)) {
+error_setg(errp, "Different aio context is not supported for new "
+   "node");
+}
+
+aio_context_acquire(ctx);
+blk_replace_bs(blk, bs, errp);
+aio_context_release(ctx);
 return;
 }
 
@@ -114,7 +132,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
 
 blk = blk_by_name(str);
 if (!blk) {
-BlockDriverState *bs = bdrv_lookup_bs(NULL, str, NULL);
+bs = bdrv_lookup_bs(NULL, str, NULL);
 if (bs) {
 /*
  * If the device supports iothreads, it will make sure to move the
@@ -123,8 +141,7 @@ static void set_drive_helper(Object *obj, Visitor *v, const 
char *name,
  * aware of iothreads require their BlockBackends to be in the main
  * AioContext.
  */
-AioContext *ctx = iothread ? bdrv_get_aio_context(bs) :
- qemu_get_aio_context();
+ctx = iothread ? bdrv_get_aio_context(bs) : qemu_get_aio_context();
 blk = blk_new(ctx, 0, BLK_PERM_ALL);
 blk_created = true;
 
@@ -196,6 +213,7 @@ static void release_drive(Object *obj, const char *name, 
void *opaque)
 const PropertyInfo qdev_prop_drive = {
 .name  = "str",
 .description = "Node name or ID of a block device to use as a backend",
+.realized_set_allowed = true,
 .get   = get_drive,
 .set   = set_drive,
 .release = release_drive,
@@ -204,6 +222,7 @@ const PropertyInfo qdev_prop_drive = {
 const PropertyInfo qdev_prop_drive_iothread = {
 .name  = "str",
 .description = "Node name or ID of a block device to use as a backend",
+.realized_set_allowed = true,
 .get   = get_drive,
 .set   = set_drive_iothread,
 .release = release_drive,
@@ -238,7 +257,7 @@ static void set_chr(Object *obj, Visitor *v, const char 
*name, void *opaque,
  * TODO Should this really be an error?  If no, the old value
  * needs to be released before we store the new one.
  */
-if (!check_prop_still_unset(obj, name, be->chr, str, errp)) {
+if (!check_prop_still_unset(obj, name, be->chr, str, false, errp)) {
 return;
 }
 
@@ -408,7 +427,7 @@ static void set_netdev(Object *obj, Visitor *v, const char 
*name,
  * TODO Should this really be an error?  If no, the old value
  * needs to be released before we store the new one.
  */
-if (!check_prop_still_unset(obj, name, ncs[i], str, errp)) {
+if (!check_prop_still_unset(obj, name, ncs[i], str, false, errp)) {
 goto out;
 }
 
-- 
2.31.1




[PULL 22/56] block: introduce blk_replace_bs

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Add function to change bs inside blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-3-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 include/sysemu/block-backend.h | 1 +
 block/block-backend.c  | 8 
 2 files changed, 9 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9ac5f7bbd3..29d4fdbf63 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -102,6 +102,7 @@ BlockBackend *blk_by_public(BlockBackendPublic *public);
 BlockDriverState *blk_bs(BlockBackend *blk);
 void blk_remove_bs(BlockBackend *blk);
 int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp);
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp);
 bool bdrv_has_blk(BlockDriverState *bs);
 bool bdrv_is_root_node(BlockDriverState *bs);
 int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
diff --git a/block/block-backend.c b/block/block-backend.c
index deb55c272e..6140d133e2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -869,6 +869,14 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
 return 0;
 }
 
+/*
+ * Change BlockDriverState associated with @blk.
+ */
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp)
+{
+return bdrv_replace_child_bs(blk->root, new_bs, errp);
+}
+
 /*
  * Sets the permission bitmasks that the user of the BlockBackend needs.
  */
-- 
2.31.1




[PULL 35/56] block/copy-before-write: cbw_init(): rename variables

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

One more step closer to real .bdrv_open() handler: use more usual names
for bs being initialized and its state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-16-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index a4fee645fd..d7f1833efa 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,38 +144,37 @@ static void cbw_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
-static int cbw_init(BlockDriverState *top, BlockDriverState *source,
+static int cbw_init(BlockDriverState *bs, BlockDriverState *source,
 BlockDriverState *target, bool compress, Error **errp)
 {
-BDRVCopyBeforeWriteState *state = top->opaque;
+BDRVCopyBeforeWriteState *s = bs->opaque;
 
-top->total_sectors = source->total_sectors;
-top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+bs->total_sectors = source->total_sectors;
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
 (BDRV_REQ_FUA & source->supported_write_flags);
-top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  source->supported_zero_flags);
 
 bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!state->target) {
+s->target = bdrv_attach_child(bs, target, "target", &child_of_bds,
+  BDRV_CHILD_DATA, errp);
+if (!s->target) {
 error_prepend(errp, "Cannot attach target child: ");
 return -EINVAL;
 }
 
 bdrv_ref(source);
-top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
-  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-  errp);
-if (!top->file) {
+bs->file = bdrv_attach_child(bs, source, "file", &child_of_bds,
+ BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+ errp);
+if (!bs->file) {
 error_prepend(errp, "Cannot attach file child: ");
 return -EINVAL;
 }
 
-state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
-  errp);
-if (!state->bcs) {
+s->bcs = block_copy_state_new(bs->file, s->target, false, compress, errp);
+if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
 }
-- 
2.31.1




[PULL 32/56] block/copy-before-write: use file child instead of backing

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to publish copy-before-write filter, and there no public
backing-child-based filter in Qemu. No reason to create a precedent, so
let's refactor copy-before-write filter instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-13-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 945d9340f4..7a6c15f141 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -43,7 +43,7 @@ static coroutine_fn int cbw_co_preadv(
 BlockDriverState *bs, uint64_t offset, uint64_t bytes,
 QEMUIOVector *qiov, int flags)
 {
-return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
 
 static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
@@ -71,7 +71,7 @@ static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
 return ret;
 }
 
-return bdrv_co_pdiscard(bs->backing, offset, bytes);
+return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
 static int coroutine_fn cbw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -82,7 +82,7 @@ static int coroutine_fn cbw_co_pwrite_zeroes(BlockDriverState 
*bs,
 return ret;
 }
 
-return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
+return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
 }
 
 static coroutine_fn int cbw_co_pwritev(BlockDriverState *bs,
@@ -95,29 +95,22 @@ static coroutine_fn int cbw_co_pwritev(BlockDriverState *bs,
 return ret;
 }
 
-return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
+return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
 }
 
 static int coroutine_fn cbw_co_flush(BlockDriverState *bs)
 {
-if (!bs->backing) {
+if (!bs->file) {
 return 0;
 }
 
-return bdrv_co_flush(bs->backing->bs);
+return bdrv_co_flush(bs->file->bs);
 }
 
 static void cbw_refresh_filename(BlockDriverState *bs)
 {
-if (bs->backing == NULL) {
-/*
- * we can be here after failed bdrv_attach_child in
- * bdrv_set_backing_hd
- */
-return;
-}
 pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
-bs->backing->bs->filename);
+bs->file->bs->filename);
 }
 
 static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
@@ -186,6 +179,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 top = bdrv_new_open_driver(&bdrv_cbw_filter, filter_node_name,
BDRV_O_RDWR, errp);
 if (!top) {
+error_prepend(errp, "Cannot open driver: ");
 return NULL;
 }
 
@@ -201,21 +195,32 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
+error_prepend(errp, "Cannot attach target child: ");
+bdrv_unref(top);
+return NULL;
+}
+
+bdrv_ref(source);
+top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+  errp);
+if (!top->file) {
+error_prepend(errp, "Cannot attach file child: ");
 bdrv_unref(top);
 return NULL;
 }
 
 bdrv_drained_begin(source);
 
-ret = bdrv_append(top, source, errp);
+ret = bdrv_replace_node(source, top, errp);
 if (ret < 0) {
 error_prepend(errp, "Cannot append copy-before-write filter: ");
 goto fail;
 }
 appended = true;
 
-state->bcs = block_copy_state_new(top->backing, state->target,
-  false, compress, errp);
+state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
+  errp);
 if (!state->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
-- 
2.31.1




[PULL 20/56] iotests: use subprocess.DEVNULL instead of open("/dev/null")

2021-09-01 Thread Hanna Reitz
From: John Snow 

Avoids a warning from pylint not to use open() outside of a
with-statement, and is ... probably more portable anyway. Not that I
think we care too much about running tests *on* Windows, but... eh.

Signed-off-by: John Snow 
Message-Id: <20210720173336.1876937-3-js...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 2ad7a15c8b..4c8971d946 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -237,18 +237,18 @@ def qemu_io_silent(*args):
 default_args = qemu_io_args
 
 args = default_args + list(args)
-exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
-if exitcode < 0:
+result = subprocess.run(args, stdout=subprocess.DEVNULL, check=False)
+if result.returncode < 0:
 sys.stderr.write('qemu-io received signal %i: %s\n' %
- (-exitcode, ' '.join(args)))
-return exitcode
+ (-result.returncode, ' '.join(args)))
+return result.returncode
 
 def qemu_io_silent_check(*args):
 '''Run qemu-io and return the true if subprocess returned 0'''
 args = qemu_io_args + list(args)
-exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'),
-   stderr=subprocess.STDOUT)
-return exitcode == 0
+result = subprocess.run(args, stdout=subprocess.DEVNULL,
+stderr=subprocess.STDOUT, check=False)
+return result.returncode == 0
 
 class QemuIoInteractive:
 def __init__(self, *args):
-- 
2.31.1




[PULL 19/56] iotests: use with-statement for open() calls

2021-09-01 Thread Hanna Reitz
From: John Snow 

Silences a new pylint warning. The dangers of *not* doing this are
somewhat unclear; I believe the file object gets garbage collected
eventually, but possibly the way in which it happens is
non-deterministic. Maybe this is a valid warning, but if there are
consequences of not doing it, I am not aware of them at present.

Signed-off-by: John Snow 
Message-Id: <20210720173336.1876937-2-js...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 2cf5ff965b..2ad7a15c8b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1120,7 +1120,8 @@ def notrun(reason):
 # Each test in qemu-iotests has a number ("seq")
 seq = os.path.basename(sys.argv[0])
 
-open('%s/%s.notrun' % (output_dir, seq), 'w').write(reason + '\n')
+with open('%s/%s.notrun' % (output_dir, seq), 'w') as outfile:
+outfile.write(reason + '\n')
 logger.warning("%s not run: %s", seq, reason)
 sys.exit(0)
 
@@ -1133,8 +1134,8 @@ def case_notrun(reason):
 # Each test in qemu-iotests has a number ("seq")
 seq = os.path.basename(sys.argv[0])
 
-open('%s/%s.casenotrun' % (output_dir, seq), 'a').write(
-'[case not run] ' + reason + '\n')
+with open('%s/%s.casenotrun' % (output_dir, seq), 'a') as outfile:
+outfile.write('[case not run] ' + reason + '\n')
 
 def _verify_image_format(supported_fmts: Sequence[str] = (),
  unsupported_fmts: Sequence[str] = ()) -> None:
-- 
2.31.1




[PULL 18/56] raw-format: drop WRITE and RESIZE child perms when possible

2021-09-01 Thread Hanna Reitz
From: Stefan Hajnoczi 

The following command-line fails due to a permissions conflict:

  $ qemu-storage-daemon \
  --blockdev driver=nvme,node-name=nvme0,device=:08:00.0,namespace=1 \
  --blockdev driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \
  --blockdev 
driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \
  --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \
  --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on \
  --export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on

  qemu-storage-daemon: --export 
type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission conflict 
on node 'nvme0': permissions 'resize' are both required by node 'l1-1' (uses 
node 'nvme0' as 'file' child) and unshared by node 'l1-2' (uses node 'nvme0' as 
'file' child).

The problem is that block/raw-format.c relies on bdrv_default_perms() to
set permissions on the nvme node. The default permissions add RESIZE in
anticipation of a format driver like qcow2 that needs to grow the image
file. This fails because RESIZE is unshared, so we cannot get the RESIZE
permission.

Max Reitz pointed out that block/crypto.c already handles this case by
implementing a custom ->bdrv_child_perm() function that adjusts the
result of bdrv_default_perms().

This patch takes the same approach in block/raw-format.c so that RESIZE
is only required if it's actually necessary (e.g. the parent is qcow2).

Cc: Max Reitz 
Cc: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20210726122839.822900-1-stefa...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Hanna Reitz 
---
 block/raw-format.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 7717578ed6..c26f493688 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -580,6 +580,25 @@ static void raw_cancel_in_flight(BlockDriverState *bs)
 bdrv_cancel_in_flight(bs->file->bs);
 }
 
+static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
+   BdrvChildRole role,
+   BlockReopenQueue *reopen_queue,
+   uint64_t parent_perm, uint64_t parent_shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+bdrv_default_perms(bs, c, role, reopen_queue, parent_perm,
+   parent_shared, nperm, nshared);
+
+/*
+ * bdrv_default_perms() may add WRITE and/or RESIZE (see comment in
+ * bdrv_default_perms_for_storage() for an explanation) but we only need
+ * them if they are in parent_perm. Drop WRITE and RESIZE whenever possible
+ * to avoid permission conflicts.
+ */
+*nperm &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
+*nperm |= parent_perm & (BLK_PERM_WRITE | BLK_PERM_RESIZE);
+}
+
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
@@ -588,7 +607,7 @@ BlockDriver bdrv_raw = {
 .bdrv_reopen_commit   = &raw_reopen_commit,
 .bdrv_reopen_abort= &raw_reopen_abort,
 .bdrv_open= &raw_open,
-.bdrv_child_perm  = bdrv_default_perms,
+.bdrv_child_perm  = raw_child_perm,
 .bdrv_co_create_opts  = &raw_co_create_opts,
 .bdrv_co_preadv   = &raw_co_preadv,
 .bdrv_co_pwritev  = &raw_co_pwritev,
-- 
2.31.1




[PULL 16/56] docs/devel/testing: add -p option to the debug section of QEMU iotests

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-17-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 docs/devel/testing.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 8ebcf85a31..4a0abbf23d 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -249,6 +249,10 @@ a failing test:
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
+* ``-p`` (print) redirects QEMU’s stdout and stderr to the test output,
+  instead of saving it into a log file in
+  ``$TEST_DIR/qemu-machine-``.
+
 Test case groups
 
 
-- 
2.31.1




[PULL 47/56] iotests/222: constantly use single quotes for strings

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

The file use both single and double quotes for strings. Let's be
consistent.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-28-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/222 | 68 +-
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index 5e2556f8df..799369e290 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -30,23 +30,23 @@ iotests.script_initialize(
 supported_platforms=['linux'],
 )
 
-patterns = [("0x5d", "0", "64k"),
-("0xd5", "1M","64k"),
-("0xdc", "32M",   "64k"),
-("0xcd", "0x3ff", "64k")]  # 64M - 64K
+patterns = [('0x5d', '0', '64k'),
+('0xd5', '1M','64k'),
+('0xdc', '32M',   '64k'),
+('0xcd', '0x3ff', '64k')]  # 64M - 64K
 
-overwrite = [("0xab", "0", "64k"), # Full overwrite
- ("0xad", "0x00f8000", "64k"), # Partial-left (1M-32K)
- ("0x1d", "0x2008000", "64k"), # Partial-right (32M+32K)
- ("0xea", "0x3fe", "64k")] # Adjacent-left (64M - 128K)
+overwrite = [('0xab', '0', '64k'), # Full overwrite
+ ('0xad', '0x00f8000', '64k'), # Partial-left (1M-32K)
+ ('0x1d', '0x2008000', '64k'), # Partial-right (32M+32K)
+ ('0xea', '0x3fe', '64k')] # Adjacent-left (64M - 128K)
 
-zeroes = [("0", "0x00f8000", "32k"), # Left-end of partial-left (1M-32K)
-  ("0", "0x201", "32k"), # Right-end of partial-right (32M+64K)
-  ("0", "0x3fe", "64k")] # overwrite[3]
+zeroes = [('0', '0x00f8000', '32k'), # Left-end of partial-left (1M-32K)
+  ('0', '0x201', '32k'), # Right-end of partial-right (32M+64K)
+  ('0', '0x3fe', '64k')] # overwrite[3]
 
-remainder = [("0xd5", "0x108000",  "32k"), # Right-end of partial-left [1]
- ("0xdc", "32M",   "32k"), # Left-end of partial-right [2]
- ("0xcd", "0x3ff", "64k")] # patterns[3]
+remainder = [('0xd5', '0x108000',  '32k'), # Right-end of partial-left [1]
+ ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
+ ('0xcd', '0x3ff', '64k')] # patterns[3]
 
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
@@ -58,7 +58,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
-assert qemu_img('create', '-f', "qcow2", fleece_img_path, '64M') == 0
+assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
 for p in patterns:
 qemu_io('-f', iotests.imgfmt,
@@ -78,43 +78,43 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-src_node = "drive0"
-tgt_node = "fleeceNode"
+src_node = 'drive0'
+tgt_node = 'fleeceNode'
 
 # create tgt_node backed by src_node
-log(vm.qmp("blockdev-add", {
-"driver": "qcow2",
-"node-name": tgt_node,
-"file": {
-"driver": "file",
-"filename": fleece_img_path,
+log(vm.qmp('blockdev-add', {
+'driver': 'qcow2',
+'node-name': tgt_node,
+'file': {
+'driver': 'file',
+'filename': fleece_img_path,
 },
-"backing": src_node,
+'backing': src_node,
 }))
 
 # Establish COW from source to fleecing node
-log(vm.qmp("blockdev-backup",
+log(vm.qmp('blockdev-backup',
device=src_node,
target=tgt_node,
-   sync="none"))
+   sync='none'))
 
 log('')
 log('--- Setting up NBD Export ---')
 log('')
 
 nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
-log(vm.qmp("nbd-server-start",
-   {"addr": { "type": "unix",
-  "data": { "path": nbd_sock_path } } }))
+log(vm.qmp('nbd-server-start',
+   {'addr': { 'type': 'unix',
+  'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp("nbd-server-add", device=tgt_node))
+log(vm.qmp('nbd-server-add', device=tgt_node))
 
 log('')
 log('--- Sanity Check ---')
 log('')
 
 for p in patterns + zeroes:
-cmd = "read -P%s %s %s" % p
+cmd = 'read -P%s %s %s' % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
 
@@ -123,7 +123,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 for p in overwrite:
-cmd = "write -P%s %s %s" % p
+cmd = 'write -P%s %s %s' % p
 log(cmd)
 log(vm.hmp_qemu_io(src_node, cmd))
 
@@ -132,7 +132,7 @@ with iotests.FilePath('base.img') as bas

[PULL 50/56] iotests/image-fleecing: proper source device

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Define scsi device to operate with it by qom-set in further patch.

Give a new node-name to source block node, to not look like device
name.

Job now don't want to work without giving explicit id, so, let's call
it "fleecing".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-31-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/image-fleecing | 12 
 tests/qemu-iotests/tests/image-fleecing.out |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index 799369e290..961941bb27 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -70,7 +70,11 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Launching VM ---')
 log('')
 
-vm.add_drive(base_img_path)
+src_node = 'source'
+vm.add_blockdev(f'driver={iotests.imgfmt},file.driver=file,'
+f'file.filename={base_img_path},node-name={src_node}')
+vm.add_device('virtio-scsi')
+vm.add_device(f'scsi-hd,id=sda,drive={src_node}')
 vm.launch()
 log('Done')
 
@@ -78,7 +82,6 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up Fleecing Graph ---')
 log('')
 
-src_node = 'drive0'
 tgt_node = 'fleeceNode'
 
 # create tgt_node backed by src_node
@@ -94,6 +97,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 # Establish COW from source to fleecing node
 log(vm.qmp('blockdev-backup',
+   job_id='fleecing',
device=src_node,
target=tgt_node,
sync='none'))
@@ -125,7 +129,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 for p in overwrite:
 cmd = 'write -P%s %s %s' % p
 log(cmd)
-log(vm.hmp_qemu_io(src_node, cmd))
+log(vm.hmp_qemu_io('/machine/peripheral/sda', cmd, qdev=True))
 
 log('')
 log('--- Verifying Data ---')
@@ -140,7 +144,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Cleanup ---')
 log('')
 
-log(vm.qmp('block-job-cancel', device=src_node))
+log(vm.qmp('block-job-cancel', device='fleecing'))
 e = vm.event_wait('BLOCK_JOB_CANCELLED')
 assert e is not None
 log(e, filters=[iotests.filter_qmp_event])
diff --git a/tests/qemu-iotests/tests/image-fleecing.out 
b/tests/qemu-iotests/tests/image-fleecing.out
index 16643dde30..314a1b54e9 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -50,7 +50,7 @@ read -P0 0x3fe 64k
 --- Cleanup ---
 
 {"return": {}}
-{"data": {"device": "drive0", "len": 67108864, "offset": 393216, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "fleecing", "len": 67108864, "offset": 393216, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"return": {}}
 {"return": {}}
 
-- 
2.31.1




[PULL 33/56] block/copy-before-write: bdrv_cbw_append(): replace child at last

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Refactor the function to replace child at last. Thus we don't need to
revert it and code is simplified.

block-copy state initialization being done before replacing the child
doesn't need any drained section.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-14-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 7a6c15f141..adbbc64aa9 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -172,7 +172,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 int ret;
 BDRVCopyBeforeWriteState *state;
 BlockDriverState *top;
-bool appended = false;
 
 assert(source->total_sectors == target->total_sectors);
 
@@ -196,8 +195,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
 error_prepend(errp, "Cannot attach target child: ");
-bdrv_unref(top);
-return NULL;
+goto fail;
 }
 
 bdrv_ref(source);
@@ -206,18 +204,8 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   errp);
 if (!top->file) {
 error_prepend(errp, "Cannot attach file child: ");
-bdrv_unref(top);
-return NULL;
-}
-
-bdrv_drained_begin(source);
-
-ret = bdrv_replace_node(source, top, errp);
-if (ret < 0) {
-error_prepend(errp, "Cannot append copy-before-write filter: ");
 goto fail;
 }
-appended = true;
 
 state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
   errp);
@@ -225,21 +213,22 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
 }
-*bcs = state->bcs;
 
+bdrv_drained_begin(source);
+ret = bdrv_replace_node(source, top, errp);
 bdrv_drained_end(source);
+if (ret < 0) {
+error_prepend(errp, "Cannot append copy-before-write filter: ");
+goto fail;
+}
+
+*bcs = state->bcs;
 
 return top;
 
 fail:
-if (appended) {
-bdrv_cbw_drop(top);
-} else {
-bdrv_unref(top);
-}
-
-bdrv_drained_end(source);
-
+block_copy_state_free(state->bcs);
+bdrv_unref(top);
 return NULL;
 }
 
-- 
2.31.1




[PULL 46/56] iotests/222: fix pylint and mypy complains

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Here:
 - long line
 - move to new interface of vm.qmp() (direct passing dict), to avoid
   mypy false-positive, as it thinks that unpacked dict is a positional
   argument.
 - extra parenthesis
 - handle event_wait possible None value

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-27-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/222 | 20 +++-
 tests/qemu-iotests/297 |  2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222
index b48afe623e..5e2556f8df 100755
--- a/tests/qemu-iotests/222
+++ b/tests/qemu-iotests/222
@@ -50,7 +50,8 @@ remainder = [("0xd5", "0x108000",  "32k"), # Right-end of 
partial-left [1]
 
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
- iotests.FilePath('nbd.sock', base_dir=iotests.sock_dir) as nbd_sock_path, 
\
+ iotests.FilePath('nbd.sock',
+  base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
 
 log('--- Setting up images ---')
@@ -81,7 +82,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 tgt_node = "fleeceNode"
 
 # create tgt_node backed by src_node
-log(vm.qmp("blockdev-add", **{
+log(vm.qmp("blockdev-add", {
 "driver": "qcow2",
 "node-name": tgt_node,
 "file": {
@@ -103,8 +104,8 @@ with iotests.FilePath('base.img') as base_img_path, \
 
 nbd_uri = 'nbd+unix:///%s?socket=%s' % (tgt_node, nbd_sock_path)
 log(vm.qmp("nbd-server-start",
-   **{"addr": { "type": "unix",
-"data": { "path": nbd_sock_path } } }))
+   {"addr": { "type": "unix",
+  "data": { "path": nbd_sock_path } } }))
 
 log(vm.qmp("nbd-server-add", device=tgt_node))
 
@@ -112,7 +113,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Sanity Check ---')
 log('')
 
-for p in (patterns + zeroes):
+for p in patterns + zeroes:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
@@ -130,7 +131,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Verifying Data ---')
 log('')
 
-for p in (patterns + zeroes):
+for p in patterns + zeroes:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
@@ -140,8 +141,9 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 log(vm.qmp('block-job-cancel', device=src_node))
-log(vm.event_wait('BLOCK_JOB_CANCELLED'),
-filters=[iotests.filter_qmp_event])
+e = vm.event_wait('BLOCK_JOB_CANCELLED')
+assert e is not None
+log(e, filters=[iotests.filter_qmp_event])
 log(vm.qmp('nbd-server-stop'))
 log(vm.qmp('blockdev-del', node_name=tgt_node))
 vm.shutdown()
@@ -150,7 +152,7 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Confirming writes ---')
 log('')
 
-for p in (overwrite + remainder):
+for p in overwrite + remainder:
 cmd = "read -P%s %s %s" % p
 log(cmd)
 assert qemu_io_silent(base_img_path, '-c', cmd) == 0
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
index 433b732336..345b617b34 100755
--- a/tests/qemu-iotests/297
+++ b/tests/qemu-iotests/297
@@ -31,7 +31,7 @@ SKIP_FILES = (
 '096', '118', '124', '132', '136', '139', '147', '148', '149',
 '151', '152', '155', '163', '165', '169', '194', '196', '199', '202',
 '203', '205', '206', '207', '208', '210', '211', '212', '213', '216',
-'218', '219', '222', '224', '228', '234', '235', '236', '237', '238',
+'218', '219', '224', '228', '234', '235', '236', '237', '238',
 '240', '242', '245', '246', '248', '255', '256', '257', '258', '260',
 '262', '264', '266', '274', '277', '280', '281', '295', '296', '298',
 '299', '302', '303', '304', '307',
-- 
2.31.1




[PULL 42/56] qapi: publish copy-before-write filter

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Acked-by: Markus Armbruster 
Message-Id: <20210824083856.17408-23-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 qapi/block-core.json | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06674c25c9..c8ce1d9d5d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2825,13 +2825,14 @@
 # @blklogwrites: Since 3.0
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
+# @copy-before-write: Since 6.2
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
-'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
-'gluster',
+'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
+'file', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 'http', 'https', 'iscsi',
@@ -4049,6 +4050,25 @@
   'base': 'BlockdevOptionsGenericFormat',
   'data': { '*bottom': 'str' } }
 
+##
+# @BlockdevOptionsCbw:
+#
+# Driver specific block device options for the copy-before-write driver,
+# which does so called copy-before-write operations: when data is
+# written to the filter, the filter first reads corresponding blocks
+# from its file child and copies them to @target child. After successfully
+# copying, the write request is propagated to file child. If copying
+# fails, the original write request is failed too and no data is written
+# to file child.
+#
+# @target: The target for copy-before-write operations.
+#
+# Since: 6.2
+##
+{ 'struct': 'BlockdevOptionsCbw',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'target': 'BlockdevRef' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -4101,6 +4121,7 @@
   'bochs':  'BlockdevOptionsGenericFormat',
   'cloop':  'BlockdevOptionsGenericFormat',
   'compress':   'BlockdevOptionsGenericFormat',
+  'copy-before-write':'BlockdevOptionsCbw',
   'copy-on-read':'BlockdevOptionsCor',
   'dmg':'BlockdevOptionsGenericFormat',
   'file':   'BlockdevOptionsFile',
-- 
2.31.1




[PULL 40/56] block/block-copy: make setting progress optional

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Now block-copy will crash if user don't set progress meter by
block_copy_set_progress_meter(). copy-before-write filter will be used
in separate of backup job, and it doesn't want any progress meter (for
now). So, allow not setting it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-21-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/block-copy.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 5d0076ac93..443261e4e4 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -292,9 +292,11 @@ static void coroutine_fn block_copy_task_end(BlockCopyTask 
*task, int ret)
 bdrv_set_dirty_bitmap(task->s->copy_bitmap, task->offset, task->bytes);
 }
 QLIST_REMOVE(task, list);
-progress_set_remaining(task->s->progress,
-   bdrv_get_dirty_count(task->s->copy_bitmap) +
-   task->s->in_flight_bytes);
+if (task->s->progress) {
+progress_set_remaining(task->s->progress,
+   bdrv_get_dirty_count(task->s->copy_bitmap) +
+   task->s->in_flight_bytes);
+}
 qemu_co_queue_restart_all(&task->wait_queue);
 }
 
@@ -594,7 +596,7 @@ static coroutine_fn int block_copy_task_entry(AioTask *task)
 t->call_state->ret = ret;
 t->call_state->error_is_read = error_is_read;
 }
-} else {
+} else if (s->progress) {
 progress_work_done(s->progress, t->bytes);
 }
 }
@@ -700,9 +702,11 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 if (!ret) {
 qemu_co_mutex_lock(&s->lock);
 bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-progress_set_remaining(s->progress,
-   bdrv_get_dirty_count(s->copy_bitmap) +
-   s->in_flight_bytes);
+if (s->progress) {
+progress_set_remaining(s->progress,
+   bdrv_get_dirty_count(s->copy_bitmap) +
+   s->in_flight_bytes);
+}
 qemu_co_mutex_unlock(&s->lock);
 }
 
-- 
2.31.1




[PULL 31/56] block/copy-before-write: drop extra bdrv_unref on failure path

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

bdrv_attach_child() do bdrv_unref() on failure, so we shouldn't do it
by hand here.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-12-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2a51cc64e4..945d9340f4 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -201,7 +201,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
   BDRV_CHILD_DATA, errp);
 if (!state->target) {
-bdrv_unref(target);
 bdrv_unref(top);
 return NULL;
 }
-- 
2.31.1




[PULL 28/56] block/backup: set copy_range and compress after filter insertion

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

We are going to publish copy-before-write filter, so it would be
initialized through options. Still we don't want to publish compress
and copy-range options, as

1. Modern way to enable compression is to use compress filter.

2. For copy-range it's unclean how to make proper interface:
 - it's has experimental prefix for backup job anyway
 - the whole BackupPerf structure doesn't make sense for the filter
 So, let's just add copy-range possibility to the filter later if
 needed.

Still, we are going to continue support for compression and
experimental copy-range in backup job. So, set these options after
filter creation.

Note, that we can drop "compress" argument of bdrv_cbw_append() now, as
well as "perf". The only reason not doing so is that now, when I
prepare this patch the big series around it is already reviewed and I
want to avoid extra rebase conflicts to simplify review of the
following version.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
Message-Id: <20210824083856.17408-9-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.h | 1 -
 block/backup.c| 3 ++-
 block/copy-before-write.c | 4 +---
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index f37e2249ae..538aab8bdb 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -33,7 +33,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   uint64_t cluster_size,
-  BackupPerf *perf,
   bool compress,
   BlockCopyState **bcs,
   Error **errp);
diff --git a/block/backup.c b/block/backup.c
index 84f9a5f02c..b31fd99ab3 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -504,7 +504,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 
 cbw = bdrv_cbw_append(bs, target, filter_node_name,
-  cluster_size, perf, compress, &bcs, errp);
+  cluster_size, false, &bcs, errp);
 if (!cbw) {
 goto error;
 }
@@ -530,6 +530,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->len = len;
 job->perf = *perf;
 
+block_copy_set_copy_opts(bcs, perf->use_copy_range, compress);
 block_copy_set_progress_meter(bcs, &job->common.job.progress);
 block_copy_set_speed(bcs, speed);
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4337076c1c..235251a640 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   uint64_t cluster_size,
-  BackupPerf *perf,
   bool compress,
   BlockCopyState **bcs,
   Error **errp)
@@ -217,8 +216,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 
 state->cluster_size = cluster_size;
 state->bcs = block_copy_state_new(top->backing, state->target,
-  cluster_size, perf->use_copy_range,
-  compress, errp);
+  cluster_size, false, compress, errp);
 if (!state->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 goto fail;
-- 
2.31.1




[PULL 34/56] block/copy-before-write: introduce cbw_init()

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Move part of bdrv_cbw_append() to new function cbw_open(). It's an
intermediate step for adding normal .bdrv_open() handler to the
filter. With this commit no logic is changed, but we have a function
which will be turned into .bdrv_open() handler in future commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-15-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 block/copy-before-write.c | 69 +++
 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index adbbc64aa9..a4fee645fd 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -144,6 +144,45 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 }
 }
 
+static int cbw_init(BlockDriverState *top, BlockDriverState *source,
+BlockDriverState *target, bool compress, Error **errp)
+{
+BDRVCopyBeforeWriteState *state = top->opaque;
+
+top->total_sectors = source->total_sectors;
+top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+(BDRV_REQ_FUA & source->supported_write_flags);
+top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+ source->supported_zero_flags);
+
+bdrv_ref(target);
+state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
+  BDRV_CHILD_DATA, errp);
+if (!state->target) {
+error_prepend(errp, "Cannot attach target child: ");
+return -EINVAL;
+}
+
+bdrv_ref(source);
+top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+  errp);
+if (!top->file) {
+error_prepend(errp, "Cannot attach file child: ");
+return -EINVAL;
+}
+
+state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
+  errp);
+if (!state->bcs) {
+error_prepend(errp, "Cannot create block-copy-state: ");
+return -EINVAL;
+}
+
+return 0;
+}
+
 BlockDriver bdrv_cbw_filter = {
 .format_name = "copy-before-write",
 .instance_size = sizeof(BDRVCopyBeforeWriteState),
@@ -181,36 +220,10 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState 
*source,
 error_prepend(errp, "Cannot open driver: ");
 return NULL;
 }
-
 state = top->opaque;
-top->total_sectors = source->total_sectors;
-top->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
-(BDRV_REQ_FUA & source->supported_write_flags);
-top->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
-((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
- source->supported_zero_flags);
-
-bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
-  BDRV_CHILD_DATA, errp);
-if (!state->target) {
-error_prepend(errp, "Cannot attach target child: ");
-goto fail;
-}
 
-bdrv_ref(source);
-top->file = bdrv_attach_child(top, source, "file", &child_of_bds,
-  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-  errp);
-if (!top->file) {
-error_prepend(errp, "Cannot attach file child: ");
-goto fail;
-}
-
-state->bcs = block_copy_state_new(top->file, state->target, false, 
compress,
-  errp);
-if (!state->bcs) {
-error_prepend(errp, "Cannot create block-copy-state: ");
+ret = cbw_init(top, source, target, compress, errp);
+if (ret < 0) {
 goto fail;
 }
 
-- 
2.31.1




[PULL 14/56] docs/devel/testing: add -valgrind option to the debug section of QEMU iotests

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210809090114.64834-15-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 docs/devel/testing.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 01e1919873..8ebcf85a31 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -240,6 +240,12 @@ a failing test:
   If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
   regardless of whether it is set or not.
 
+* ``-valgrind`` attaches a valgrind instance to QEMU. If it detects
+  warnings, it will print and save the log in
+  ``$TEST_DIR/.valgrind``.
+  The final command line will be ``valgrind --log-file=$TEST_DIR/
+  .valgrind --error-exitcode=99 $QEMU ...``
+
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
-- 
2.31.1




[PULL 12/56] qemu-iotests: allow valgrind to read/delete the generated log file

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

When using -valgrind on the script tests, it generates a log file
in $TEST_DIR that is either read (if valgrind finds problems) or
otherwise deleted. Provide the same exact behavior when using
-valgrind on the python tests.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-13-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 26c580f9e7..85d8c0abbb 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -598,6 +598,17 @@ def __init__(self, path_suffix=''):
  sock_dir=sock_dir, qmp_timer=timer)
 self._num_drives = 0
 
+def _post_shutdown(self) -> None:
+super()._post_shutdown()
+if not qemu_valgrind or not self._popen:
+return
+valgrind_filename =  f"{test_dir}/{self._popen.pid}.valgrind"
+if self.exitcode() == 99:
+with open(valgrind_filename) as f:
+print(f.read())
+else:
+os.remove(valgrind_filename)
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
-- 
2.31.1




[PULL 23/56] qdev-properties: PropertyInfo: add realized_set_allowed field

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Add field, so property can declare support for setting the property
when device is realized. To be used in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-4-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 include/hw/qdev-properties.h | 1 +
 hw/core/qdev-properties.c| 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 0ef97d60ce..f7925f67d0 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -32,6 +32,7 @@ struct PropertyInfo {
 const char *name;
 const char *description;
 const QEnumLookup *enum_table;
+bool realized_set_allowed; /* allow setting property on realized device */
 int (*print)(Object *obj, Property *prop, char *dest, size_t len);
 void (*set_default_value)(ObjectProperty *op, const Property *prop);
 ObjectProperty *(*create)(ObjectClass *oc, const char *name,
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 50f40949f5..c34aac6ebc 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -26,11 +26,11 @@ void qdev_prop_set_after_realize(DeviceState *dev, const 
char *name,
 
 /* returns: true if property is allowed to be set, false otherwise */
 static bool qdev_prop_allow_set(Object *obj, const char *name,
-Error **errp)
+const PropertyInfo *info, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 
-if (dev->realized) {
+if (dev->realized && !info->realized_set_allowed) {
 qdev_prop_set_after_realize(dev, name, errp);
 return false;
 }
@@ -79,7 +79,7 @@ static void field_prop_set(Object *obj, Visitor *v, const 
char *name,
 {
 Property *prop = opaque;
 
-if (!qdev_prop_allow_set(obj, name, errp)) {
+if (!qdev_prop_allow_set(obj, name, prop->info, errp)) {
 return;
 }
 
-- 
2.31.1




[PULL 21/56] block: introduce bdrv_replace_child_bs()

2021-09-01 Thread Hanna Reitz
From: Vladimir Sementsov-Ogievskiy 

Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210824083856.17408-2-vsement...@virtuozzo.com>
Signed-off-by: Hanna Reitz 
---
 include/block/block.h |  2 ++
 block.c   | 31 +++
 2 files changed, 33 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 3477290f9a..740038a892 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
 Error **errp);
 int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
   Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
 BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
int flags, Error **errp);
 int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index e97ce0b1c8..b2b66263f9 100644
--- a/block.c
+++ b/block.c
@@ -5048,6 +5048,37 @@ out:
 return ret;
 }
 
+/* Not for empty child */
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp)
+{
+int ret;
+Transaction *tran = tran_new();
+g_autoptr(GHashTable) found = NULL;
+g_autoptr(GSList) refresh_list = NULL;
+BlockDriverState *old_bs = child->bs;
+
+bdrv_ref(old_bs);
+bdrv_drained_begin(old_bs);
+bdrv_drained_begin(new_bs);
+
+bdrv_replace_child_tran(child, new_bs, tran);
+
+found = g_hash_table_new(NULL, NULL);
+refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
+refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
+
+tran_finalize(tran, ret);
+
+bdrv_drained_end(old_bs);
+bdrv_drained_end(new_bs);
+bdrv_unref(old_bs);
+
+return ret;
+}
+
 static void bdrv_delete(BlockDriverState *bs)
 {
 assert(bdrv_op_blocker_is_empty(bs));
-- 
2.31.1




[PULL 15/56] qemu-iotests: add option to show qemu binary logs on stdout

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Using the flag -p, allow the qemu binary to print to stdout.

Also create the common function _close_qemu_log_file() to
avoid accessing machine.py private fields directly and have
duplicate code.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-16-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 9 ++---
 tests/qemu-iotests/check   | 4 +++-
 tests/qemu-iotests/iotests.py  | 8 
 tests/qemu-iotests/testenv.py  | 9 +++--
 4 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 14c4d17eca..8b935813e9 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -348,6 +348,11 @@ def _post_launch(self) -> None:
 if self._qmp_connection:
 self._qmp.accept(self._qmp_timer)
 
+def _close_qemu_log_file(self) -> None:
+if self._qemu_log_file is not None:
+self._qemu_log_file.close()
+self._qemu_log_file = None
+
 def _post_shutdown(self) -> None:
 """
 Called to cleanup the VM instance after the process has exited.
@@ -360,9 +365,7 @@ def _post_shutdown(self) -> None:
 self._qmp.close()
 self._qmp_connection = None
 
-if self._qemu_log_file is not None:
-self._qemu_log_file.close()
-self._qemu_log_file = None
+self._close_qemu_log_file()
 
 self._load_io_log()
 
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index ebd27946db..da1bfb839e 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -36,6 +36,8 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-p', dest='print', action='store_true',
+help='redirects qemu\'s stdout and stderr to the test output')
 p.add_argument('-gdb', action='store_true',
help="start gdbserver with $GDB_OPTIONS options \
 ('localhost:12345' if $GDB_OPTIONS is empty)")
@@ -119,7 +121,7 @@ if __name__ == '__main__':
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
   debug=args.debug, valgrind=args.valgrind,
-  gdb=args.gdb)
+  gdb=args.gdb, qprint=args.print)
 
 if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--':
 if not args.tests:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 74fa56840d..2cf5ff965b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -79,6 +79,8 @@
 if gdb_qemu_env:
 qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ')
 
+qemu_print = os.environ.get('PRINT_QEMU', False)
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
@@ -613,6 +615,12 @@ def _post_shutdown(self) -> None:
 else:
 os.remove(valgrind_filename)
 
+def _pre_launch(self) -> None:
+super()._pre_launch()
+if qemu_print:
+# set QEMU binary output to stdout
+self._close_qemu_log_file()
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 8bf154376f..70da0d60c8 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -74,7 +74,7 @@ class TestEnv(ContextManager['TestEnv']):
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
  'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
- 'GDB_OPTIONS']
+ 'GDB_OPTIONS', 'PRINT_QEMU']
 
 def prepare_subprocess(self, args: List[str]) -> Dict[str, str]:
 if self.debug:
@@ -181,7 +181,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  misalign: bool = False,
  debug: bool = False,
  valgrind: bool = False,
- gdb: bool = False) -> None:
+ gdb: bool = False,
+ qprint: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -189,6 +190,9 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
 self.misalign = misalign
 self.debug = debug
 
+if qprint:
+self.print_qemu = 'y'
+
 if gdb:
 self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS)
 if 

[PULL 13/56] qemu-iotests: insert valgrind command line as wrapper for qemu binary

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

If -gdb and -valgrind are both defined, return an error.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-14-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 85d8c0abbb..74fa56840d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -591,7 +591,11 @@ class VM(qtest.QEMUQtestMachine):
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
 timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None
-super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+if qemu_gdb and qemu_valgrind:
+sys.stderr.write('gdb and valgrind are mutually exclusive\n')
+sys.exit(1)
+wrapper = qemu_gdb if qemu_gdb else qemu_valgrind
+super().__init__(qemu_prog, qemu_opts, wrapper=wrapper,
  name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
-- 
2.31.1




[PULL 17/56] block/monitor: Consolidate hmp_handle_error calls to reduce redundant code

2021-09-01 Thread Hanna Reitz
From: Mao Zhongyi 

Signed-off-by: Mao Zhongyi 
Message-Id: <20210802062507.347555-1-maozhon...@cmss.chinamobile.com>
Signed-off-by: Hanna Reitz 
---
 block/monitor/block-hmp-cmds.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 3e6670c963..2ac4aedfff 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -251,10 +251,10 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
 
 if (!filename) {
 error_setg(&err, QERR_MISSING_PARAMETER, "target");
-hmp_handle_error(mon, err);
-return;
+goto end;
 }
 qmp_drive_mirror(&mirror, &err);
+end:
 hmp_handle_error(mon, err);
 }
 
@@ -281,11 +281,11 @@ void hmp_drive_backup(Monitor *mon, const QDict *qdict)
 
 if (!filename) {
 error_setg(&err, QERR_MISSING_PARAMETER, "target");
-hmp_handle_error(mon, err);
-return;
+goto end;
 }
 
 qmp_drive_backup(&backup, &err);
+end:
 hmp_handle_error(mon, err);
 }
 
@@ -356,8 +356,7 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
  * will be taken internally. Today it's actually required.
  */
 error_setg(&err, QERR_MISSING_PARAMETER, "snapshot-file");
-hmp_handle_error(mon, err);
-return;
+goto end;
 }
 
 mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
@@ -365,6 +364,7 @@ void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
filename, false, NULL,
!!format, format,
true, mode, &err);
+end:
 hmp_handle_error(mon, err);
 }
 
-- 
2.31.1




[PULL 11/56] qemu-iotests: extend QMP socket timeout when using valgrind

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

As with gdbserver, valgrind delays the test execution, so
the default QMP socket timeout and the generic class
Timeout in iotests.py timeouts too soon.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-12-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6aa1dc48ba..26c580f9e7 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -488,13 +488,13 @@ def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 return self
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
 def __exit__(self, exc_type, value, traceback):
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 return False
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
@@ -590,7 +590,7 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-timer = 15.0 if not qemu_gdb else None
+timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None
 super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
  name=name,
  base_temp_dir=test_dir,
-- 
2.31.1




[PULL 07/56] qemu_iotests: insert gdbserver command line as wrapper for qemu binary

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-8-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e176a84620..e7e3d92d3e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -580,7 +580,8 @@ class VM(qtest.QEMUQtestMachine):
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
 timer = 15.0 if not qemu_gdb else None
-super().__init__(qemu_prog, qemu_opts, name=name,
+super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+ name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir, qmp_timer=timer)
-- 
2.31.1




[PULL 06/56] qemu-iotests: delay QMP socket timers

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Attaching gdbserver implies that the qmp socket
should wait indefinitely for an answer from QEMU.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-7-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index c86f239d81..e176a84620 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -477,10 +477,14 @@ def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
+if qemu_gdb:
+return self
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
 def __exit__(self, exc_type, value, traceback):
+if qemu_gdb:
+return False
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
 def timeout(self, signum, frame):
@@ -575,7 +579,7 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-timer = 15.0
+timer = 15.0 if not qemu_gdb else None
 super().__init__(qemu_prog, qemu_opts, name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
-- 
2.31.1




[PULL 10/56] qemu-iotests: extend the check script to prepare supporting valgrind for python tests

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Currently, the check script only parses the option and sets the
VALGRIND_QEMU environmental variable to "y".
Add another local python variable that prepares the command line,
identical to the one provided in the test scripts.

Because the python script does not know in advance the valgrind
PID to assign to the log file name, use the "%p" flag in valgrind
log file name that automatically puts the process PID at runtime.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20210809090114.64834-11-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/check  |  7 ---
 tests/qemu-iotests/iotests.py | 11 +++
 tests/qemu-iotests/testenv.py |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 4365bb8066..ebd27946db 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -39,6 +39,10 @@ def make_argparser() -> argparse.ArgumentParser:
 p.add_argument('-gdb', action='store_true',
help="start gdbserver with $GDB_OPTIONS options \
 ('localhost:12345' if $GDB_OPTIONS is empty)")
+p.add_argument('-valgrind', action='store_true',
+help='use valgrind, sets VALGRIND_QEMU environment '
+'variable')
+
 p.add_argument('-misalign', action='store_true',
help='misalign memory allocations')
 p.add_argument('--color', choices=['on', 'off', 'auto'],
@@ -88,9 +92,6 @@ def make_argparser() -> argparse.ArgumentParser:
 g_bash.add_argument('-o', dest='imgopts',
 help='options to pass to qemu-img create/convert, '
 'sets IMGOPTS environment variable')
-g_bash.add_argument('-valgrind', action='store_true',
-help='use valgrind, sets VALGRIND_QEMU environment '
-'variable')
 
 g_sel = p.add_argument_group('test selecting options',
  'The following options specify test set '
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e7e3d92d3e..6aa1dc48ba 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -96,6 +96,17 @@
 sys.stderr.write('Please run this test via the "check" script\n')
 sys.exit(os.EX_USAGE)
 
+qemu_valgrind = []
+if os.environ.get('VALGRIND_QEMU') == "y" and \
+os.environ.get('NO_VALGRIND') != "y":
+valgrind_logfile = "--log-file=" + test_dir
+# %p allows to put the valgrind process PID, since
+# we don't know it a priori (subprocess.Popen is
+# not yet invoked)
+valgrind_logfile += "/%p.valgrind"
+
+qemu_valgrind = ['valgrind', valgrind_logfile, '--error-exitcode=99']
+
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
 
 luks_default_secret_object = 'secret,id=keysec0,data=' + \
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 8501c6caf5..8bf154376f 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -298,6 +298,7 @@ def print_env(self) -> None:
 SOCK_DIR  -- {SOCK_DIR}
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
 GDB_OPTIONS   -- {GDB_OPTIONS}
+VALGRIND_QEMU -- {VALGRIND_QEMU}
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.31.1




[PULL 08/56] qemu-iotests: add gdbserver option to script tests too

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Remove read timer in test script when GDB_OPTIONS are set,
so that the bash tests won't timeout while running gdb.

The only limitation here is that running a script with gdbserver
will make the test output mismatch with the expected
results, making the test fail.

Signed-off-by: Emanuele Giuseppe Esposito 
Message-Id: <20210809090114.64834-9-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/common.qemu | 7 ++-
 tests/qemu-iotests/common.rc   | 8 +++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0fc52d20d7..0f1fecc68e 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -85,7 +85,12 @@ _timed_wait_for()
 timeout=yes
 
 QEMU_STATUS[$h]=0
-while IFS= read -t ${QEMU_COMM_TIMEOUT} resp <&${QEMU_OUT[$h]}
+read_timeout="-t ${QEMU_COMM_TIMEOUT}"
+if [ -n "${GDB_OPTIONS}" ]; then
+read_timeout=
+fi
+
+while IFS= read ${read_timeout} resp <&${QEMU_OUT[$h]}
 do
 if [ -n "$capture_events" ]; then
 capture=0
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 609d82de89..d8582454de 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -166,8 +166,14 @@ _qemu_wrapper()
 if [ -n "${QEMU_NEED_PID}" ]; then
 echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"
 fi
+
+GDB=""
+if [ -n "${GDB_OPTIONS}" ]; then
+GDB="gdbserver ${GDB_OPTIONS}"
+fi
+
 VALGRIND_QEMU="${VALGRIND_QEMU_VM}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" \
-"$QEMU_PROG" $QEMU_OPTIONS "$@"
+$GDB "$QEMU_PROG" $QEMU_OPTIONS "$@"
 )
 RETVAL=$?
 _qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
-- 
2.31.1




[PULL 05/56] qemu-iotests: add option to attach gdbserver

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Define -gdb flag and GDB_OPTIONS environment variable
to python tests to attach a gdbserver to each qemu instance.
This patch only adds and parses this flag, it does not yet add
the implementation for it.

if -gdb is not provided but $GDB_OPTIONS is set, ignore the
environment variable.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-6-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/check  |  6 +-
 tests/qemu-iotests/iotests.py |  5 +
 tests/qemu-iotests/testenv.py | 17 +++--
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 2dd529eb75..4365bb8066 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -36,6 +36,9 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-gdb', action='store_true',
+   help="start gdbserver with $GDB_OPTIONS options \
+('localhost:12345' if $GDB_OPTIONS is empty)")
 p.add_argument('-misalign', action='store_true',
help='misalign memory allocations')
 p.add_argument('--color', choices=['on', 'off', 'auto'],
@@ -114,7 +117,8 @@ if __name__ == '__main__':
 env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto,
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
-  debug=args.debug, valgrind=args.valgrind)
+  debug=args.debug, valgrind=args.valgrind,
+  gdb=args.gdb)
 
 if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--':
 if not args.tests:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6b0db4ce54..c86f239d81 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -74,6 +74,11 @@
 qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
 qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
+gdb_qemu_env = os.environ.get('GDB_OPTIONS')
+qemu_gdb = []
+if gdb_qemu_env:
+qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ')
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 0c3fe75636..8501c6caf5 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -27,6 +27,7 @@
 import glob
 from typing import List, Dict, Any, Optional, ContextManager
 
+DEF_GDB_OPTIONS = 'localhost:12345'
 
 def isxfile(path: str) -> bool:
 return os.path.isfile(path) and os.access(path, os.X_OK)
@@ -72,7 +73,8 @@ class TestEnv(ContextManager['TestEnv']):
  'QEMU_NBD_OPTIONS', 'IMGOPTS', 'IMGFMT', 'IMGPROTO',
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
- 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_']
+ 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
+ 'GDB_OPTIONS']
 
 def prepare_subprocess(self, args: List[str]) -> Dict[str, str]:
 if self.debug:
@@ -178,7 +180,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  imgopts: Optional[str] = None,
  misalign: bool = False,
  debug: bool = False,
- valgrind: bool = False) -> None:
+ valgrind: bool = False,
+ gdb: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -186,6 +189,15 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
 self.misalign = misalign
 self.debug = debug
 
+if gdb:
+self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS)
+if not self.gdb_options:
+# cover the case 'export GDB_OPTIONS='
+self.gdb_options = DEF_GDB_OPTIONS
+elif 'GDB_OPTIONS' in os.environ:
+# to not propagate it in prepare_subprocess()
+del os.environ['GDB_OPTIONS']
+
 if valgrind:
 self.valgrind_qemu = 'y'
 
@@ -285,6 +297,7 @@ def print_env(self) -> None:
 TEST_DIR  -- {TEST_DIR}
 SOCK_DIR  -- {SOCK_DIR}
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
+GDB_OPTIONS   -- {GDB_OPTIONS}
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.31.1




[PULL 09/56] docs/devel/testing: add -gdb option to the debugging section of QEMU iotests

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-10-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 docs/devel/testing.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 8359f2ae37..01e1919873 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -229,6 +229,17 @@ Debugging a test case
 The following options to the ``check`` script can be useful when debugging
 a failing test:
 
+* ``-gdb`` wraps every QEMU invocation in a ``gdbserver``, which waits for a
+  connection from a gdb client.  The options given to ``gdbserver`` (e.g. the
+  address on which to listen for connections) are taken from the 
``$GDB_OPTIONS``
+  environment variable.  By default (if ``$GDB_OPTIONS`` is empty), it listens 
on
+  ``localhost:12345``.
+  It is possible to connect to it for example with
+  ``gdb -iex "target remote $addr"``, where ``$addr`` is the address
+  ``gdbserver`` listens on.
+  If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
+  regardless of whether it is set or not.
+
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
-- 
2.31.1




[PULL 02/56] python: Reduce strictness of pylint's duplicate-code check

2021-09-01 Thread Hanna Reitz
From: John Snow 

Pylint prior to 2.8.3 (We pin at >= 2.8.0) includes function and method
signatures as part of its duplicate checking algorithm. This check does
not listen to pragmas, so the only way to disable it is to turn it off
completely or increase the minimum duplicate lines so that it doesn't
trigger for functions with long, multi-line signatures.

When we decide to upgrade to pylint 2.8.3 or greater, we will be able to
use 'ignore-signatures = true' to the config instead.

I'd prefer not to keep us on the very bleeding edge of pylint if I can
help it -- 2.8.3 came out only three days ago at time of writing.

See: https://github.com/PyCQA/pylint/pull/4474
Signed-off-by: John Snow 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Acked-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-3-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 python/setup.cfg | 5 +
 1 file changed, 5 insertions(+)

diff --git a/python/setup.cfg b/python/setup.cfg
index 14bab90288..83909c1c97 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -105,6 +105,11 @@ good-names=i,
 # Ignore imports when computing similarities.
 ignore-imports=yes
 
+# Minimum lines number of a similarity.
+# TODO: Remove after we opt in to Pylint 2.8.3. See commit msg.
+min-similarity-lines=6
+
+
 [isort]
 force_grid_wrap=4
 force_sort_within_sections=True
-- 
2.31.1




[PULL 03/56] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Acked-by: John Snow 
Message-Id: <20210809090114.64834-4-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/qtest.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py
index 592be263e0..395cc8fbfe 100644
--- a/python/qemu/machine/qtest.py
+++ b/python/qemu/machine/qtest.py
@@ -112,6 +112,7 @@ class QEMUQtestMachine(QEMUMachine):
 def __init__(self,
  binary: str,
  args: Sequence[str] = (),
+ wrapper: Sequence[str] = (),
  name: Optional[str] = None,
  base_temp_dir: str = "/var/tmp",
  socket_scm_helper: Optional[str] = None,
@@ -123,7 +124,8 @@ def __init__(self,
 name = "qemu-%d" % os.getpid()
 if sock_dir is None:
 sock_dir = base_temp_dir
-super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir,
+super().__init__(binary, args, wrapper=wrapper, name=name,
+ base_temp_dir=base_temp_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir, qmp_timer=qmp_timer)
 self._qtest: Optional[QEMUQtestProtocol] = None
-- 
2.31.1




[PULL 04/56] docs/devel/testing: add debug section to the QEMU iotests chapter

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Introduce the "Debugging a test case" section, in preparation
to the additional flags that will be added in the next patches.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-5-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 docs/devel/testing.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 8a9cda33a5..8359f2ae37 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -224,6 +224,14 @@ another application on the host may have locked the file, 
possibly leading to a
 test failure.  If using such devices are explicitly desired, consider adding
 ``locking=off`` option to disable image locking.
 
+Debugging a test case
+---
+The following options to the ``check`` script can be useful when debugging
+a failing test:
+
+* ``-d`` (debug) just increases the logging verbosity, showing
+  for example the QMP commands and answers.
+
 Test case groups
 
 
-- 
2.31.1




[PULL 00/56] Block patches

2021-09-01 Thread Hanna Reitz
The following changes since commit ec397e90d21269037280633b6058d1f280e27667:

  Merge remote-tracking branch 
'remotes/alistair/tags/pull-riscv-to-apply-20210901-2' into staging (2021-09-01 
08:33:02 +0100)

are available in the Git repository at:

  https://github.com/XanClic/qemu.git tags/pull-block-2021-09-01

for you to fetch changes up to ebd979c74e2b8a7275090475df36dde4ab858320:

  block/file-win32: add reopen handlers (2021-09-01 14:38:08 +0200)


**Note:** I’ve signed the pull request tag with my new GPG key, which I
have uploaded here:

  https://xanclic.moe/A1FA40D098019CDF

Included in that key should be the signature I made with my old key
(F407DB0061D5CF40), and I hope that’s sufficient to establish a
reasonable level of trust.

(I’ve also tried uploading this key to several keyservers, but it
appears to me that keyservers are kind of a thing of the past now,
especially when it comes to uploading keys with signatures on them...)


Block patches:
- Make the backup-top filter driver available for user-created block
  nodes (i.e. via blockdev-add)
- Allow running iotests with gdb or valgrind being attached to qemu
  instances
- Fix the raw format driver's permissions: There is no metadata, so we
  only need WRITE or RESIZE when the parent needs it
- Basic reopen implementation for win32 files (file-win32.c) so that
  qemu-img commit can work
- uclibc/musl build fix for the FUSE export code
- Some iotests delinting
- block-hmp-cmds.c refactoring


Emanuele Giuseppe Esposito (15):
  python: qemu: add timer parameter for qmp.accept socket
  python: qemu: pass the wrapper field from QEMUQtestmachine to
QEMUMachine
  docs/devel/testing: add debug section to the QEMU iotests chapter
  qemu-iotests: add option to attach gdbserver
  qemu-iotests: delay QMP socket timers
  qemu_iotests: insert gdbserver command line as wrapper for qemu binary
  qemu-iotests: add gdbserver option to script tests too
  docs/devel/testing: add -gdb option to the debugging section of QEMU
iotests
  qemu-iotests: extend the check script to prepare supporting valgrind
for python tests
  qemu-iotests: extend QMP socket timeout when using valgrind
  qemu-iotests: allow valgrind to read/delete the generated log file
  qemu-iotests: insert valgrind command line as wrapper for qemu binary
  docs/devel/testing: add -valgrind option to the debug section of QEMU
iotests
  qemu-iotests: add option to show qemu binary logs on stdout
  docs/devel/testing: add -p option to the debug section of QEMU iotests

Fabrice Fontaine (1):
  block/export/fuse.c: fix fuse-lseek on uclibc or musl

John Snow (3):
  python: Reduce strictness of pylint's duplicate-code check
  iotests: use with-statement for open() calls
  iotests: use subprocess.DEVNULL instead of open("/dev/null")

Mao Zhongyi (1):
  block/monitor: Consolidate hmp_handle_error calls to reduce redundant
code

Stefan Hajnoczi (1):
  raw-format: drop WRITE and RESIZE child perms when possible

Viktor Prutyanov (1):
  block/file-win32: add reopen handlers

Vladimir Sementsov-Ogievskiy (34):
  block: introduce bdrv_replace_child_bs()
  block: introduce blk_replace_bs
  qdev-properties: PropertyInfo: add realized_set_allowed field
  qdev: allow setting drive property for realized device
  block: rename backup-top to copy-before-write
  block-copy: move detecting fleecing scheme to block-copy
  block/block-copy: introduce block_copy_set_copy_opts()
  block/backup: set copy_range and compress after filter insertion
  block/backup: move cluster size calculation to block-copy
  block/copy-before-write: relax permission requirements when no parents
  block/copy-before-write: drop extra bdrv_unref on failure path
  block/copy-before-write: use file child instead of backing
  block/copy-before-write: bdrv_cbw_append(): replace child at last
  block/copy-before-write: introduce cbw_init()
  block/copy-before-write: cbw_init(): rename variables
  block/copy-before-write: cbw_init(): use file child after attaching
  block/copy-before-write: bdrv_cbw_append(): drop unused compress arg
  block/copy-before-write: cbw_init(): use options
  block/copy-before-write: initialize block-copy bitmap
  block/block-copy: make setting progress optional
  block/copy-before-write: make public block driver
  qapi: publish copy-before-write filter
  python/qemu/machine.py: refactor _qemu_args()
  python/qemu/machine: QEMUMachine: improve qmp() method
  python:QEMUMachine: template typing for self returning methods
  iotests/222: fix pylint and mypy complains
  iotests/222: constantly use single quotes for strings
  iotests: move 222 to tests/image-fleecing
  iotests.py: hmp_qemu_io: support qdev
  iotests/image-fleecing: proper source device
  iotests/image-fleecing: rename tgt_node
  iotests/image-fleecing: prepare for adding new test-case

[PULL 01/56] python: qemu: add timer parameter for qmp.accept socket

2021-09-01 Thread Hanna Reitz
From: Emanuele Giuseppe Esposito 

Also add a new _qmp_timer field to the QEMUMachine class.

Let's change the default socket timeout to None, so that if
a subclass needs to add a timer, it can be done by modifying
this private field.

At the same time, restore the timer to be 15 seconds in iotests.py, to
give an upper bound to the QMP monitor test command execution.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Acked-by: John Snow 
Reviewed-by: Max Reitz 
Message-Id: <20210809090114.64834-2-eespo...@redhat.com>
Signed-off-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 7 +--
 python/qemu/machine/qtest.py   | 5 +++--
 tests/qemu-iotests/iotests.py  | 3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 971ed7e8c6..14c4d17eca 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -97,7 +97,8 @@ def __init__(self,
  sock_dir: Optional[str] = None,
  drain_console: bool = False,
  console_log: Optional[str] = None,
- log_dir: Optional[str] = None):
+ log_dir: Optional[str] = None,
+ qmp_timer: Optional[float] = None):
 '''
 Initialize a QEMUMachine
 
@@ -112,6 +113,7 @@ def __init__(self,
 @param drain_console: (optional) True to drain console socket to buffer
 @param console_log: (optional) path to console log file
 @param log_dir: where to create and keep log files
+@param qmp_timer: (optional) default QMP socket timeout
 @note: Qemu process is not started until launch() is used.
 '''
 # pylint: disable=too-many-arguments
@@ -121,6 +123,7 @@ def __init__(self,
 self._binary = binary
 self._args = list(args)
 self._wrapper = wrapper
+self._qmp_timer = qmp_timer
 
 self._name = name or "qemu-%d" % os.getpid()
 self._base_temp_dir = base_temp_dir
@@ -343,7 +346,7 @@ def _pre_launch(self) -> None:
 
 def _post_launch(self) -> None:
 if self._qmp_connection:
-self._qmp.accept()
+self._qmp.accept(self._qmp_timer)
 
 def _post_shutdown(self) -> None:
 """
diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py
index d6d9c6a34a..592be263e0 100644
--- a/python/qemu/machine/qtest.py
+++ b/python/qemu/machine/qtest.py
@@ -115,7 +115,8 @@ def __init__(self,
  name: Optional[str] = None,
  base_temp_dir: str = "/var/tmp",
  socket_scm_helper: Optional[str] = None,
- sock_dir: Optional[str] = None):
+ sock_dir: Optional[str] = None,
+ qmp_timer: Optional[float] = None):
 # pylint: disable=too-many-arguments
 
 if name is None:
@@ -124,7 +125,7 @@ def __init__(self,
 sock_dir = base_temp_dir
 super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir,
  socket_scm_helper=socket_scm_helper,
- sock_dir=sock_dir)
+ sock_dir=sock_dir, qmp_timer=qmp_timer)
 self._qtest: Optional[QEMUQtestProtocol] = None
 self._qtest_path = os.path.join(sock_dir, name + "-qtest.sock")
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..6b0db4ce54 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -570,10 +570,11 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
+timer = 15.0
 super().__init__(qemu_prog, qemu_opts, name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
- sock_dir=sock_dir)
+ sock_dir=sock_dir, qmp_timer=timer)
 self._num_drives = 0
 
 def add_object(self, opts):
-- 
2.31.1




Re: [PATCH v3 0/4] iotests/297: Cover tests/

2021-09-01 Thread Hanna Reitz

On 01.09.21 15:34, Vladimir Sementsov-Ogievskiy wrote:

A kind of ping:)

Seems that never landed into master?


Yes, that’s true…

I was waiting for John (CC-ed) to send v3 of 
https://lists.nongnu.org/archive/html/qemu-block/2021-07/msg00611.html, 
because in 
https://lists.nongnu.org/archive/html/qemu-block/2021-07/msg00611.html, 
John was proposing to have it include these patches here.


But I guess there was no plan to change the patches (other than 
correcting the comment in patch 3), so I suppose I might as well. (And 
it seems like John has other things going on, so I don’t want to exert 
pressure by waiting for a v3 with these patches O:))


But I think I do have to send v4, because of that comment.  I suppose 
there can be no harm in doing so.  (And now I really wonder why I 
haven’t done so all this time...)


Hanna




Re: [PATCH v3 00/10] qcow2 check: check some reserved bits and subcluster bitmaps

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

Ping again:)

Nothing changed: patches applies to master, 08 doesn't have r-b.

03.07.2021 14:17, Vladimir Sementsov-Ogievskiy wrote:

Ping :)

This still applies to master with no conflicts. All patches reviewed except for 
08.


24.05.2021 17:20, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Here are some good refactorings and new (qemu-img check) checks for
qcow2.

v3: add r-b mark by Alberto and t-b marks by Kirill
  07, 09: add missed "\n"

Vladimir Sementsov-Ogievskiy (10):
   qcow2-refcount: improve style of check_refcounts_l2()
   qcow2: compressed read: simplify cluster descriptor passing
   qcow2: introduce qcow2_parse_compressed_l2_entry() helper
   qcow2-refcount: introduce fix_l2_entry_by_zero()
   qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap
   qcow2-refcount: check_refcounts_l2(): check l2_bitmap
   qcow2-refcount: check_refcounts_l2(): check reserved bits
   qcow2-refcount: improve style of check_refcounts_l1()
   qcow2-refcount: check_refcounts_l1(): check reserved bits
   qcow2-refcount: check_refblocks(): add separate message for reserved

  block/qcow2.h  |   7 +-
  block/qcow2-cluster.c  |  20 ++-
  block/qcow2-refcount.c | 328 ++---
  block/qcow2.c  |  13 +-
  4 files changed, 240 insertions(+), 128 deletions(-)







--
Best regards,
Vladimir



Re: [PATCH v3 0/4] iotests/297: Cover tests/

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

A kind of ping:)

Seems that never landed into master?

14.05.2021 18:43, Max Reitz wrote:

v1: https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg01471.html
v2: https://lists.nongnu.org/archive/html/qemu-block/2021-05/msg00492.html


Hi,

When reviewing Vladimir’s new addition to tests/, I noticed that 297 so
far does not cover named tests.  That isn’t so good.

This series makes it cover them, and because tests/ is rather sparse at
this point, I decided to also fix up the two tests in there that don’t
pass pylint’s scrutiny yet.  I think it would be nice if we could keep
all of tests/ clean.


v3:
- Fixed patch 3: Turns out replacing `lambda self: mc(self)` by just
   `mc` (as pylint suggests) breaks the test.  So leave it as it is and
   instead disable the warning locally.


git-backport-diff against v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'iotests/297: Drop 169 and 199 from the skip list'
002/4:[] [--] 'migrate-bitmaps-postcopy-test: Fix pylint warnings'
003/4:[0005] [FC] 'migrate-bitmaps-test: Fix pylint warnings'
004/4:[] [--] 'iotests/297: Cover tests/'


Max Reitz (4):
   iotests/297: Drop 169 and 199 from the skip list
   migrate-bitmaps-postcopy-test: Fix pylint warnings
   migrate-bitmaps-test: Fix pylint warnings
   iotests/297: Cover tests/

  tests/qemu-iotests/297|  7 ++--
  .../tests/migrate-bitmaps-postcopy-test   | 13 +++---
  tests/qemu-iotests/tests/migrate-bitmaps-test | 41 +++
  3 files changed, 34 insertions(+), 27 deletions(-)




--
Best regards,
Vladimir



Re: [PATCH v2 3/3] qapi: deprecate drive-backup

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

24.05.2021 21:37, John Snow wrote:

On 5/24/21 10:06 AM, Vladimir Sementsov-Ogievskiy wrote:

15.05.2021 01:38, John Snow wrote:

On 5/6/21 5:57 AM, Kashyap Chamarthy wrote:

TODO: We also need to deprecate drive-backup transaction action..
But union members in QAPI doesn't support 'deprecated' feature. I tried
to dig a bit, but failed :/ Markus, could you please help with it? At
least by advice?


Oho, I see.

OK, I'm not Markus, but I've been getting into lots of trouble in the QAPI 
generator lately, so let me see if I can help get you started...

https://gitlab.com/jsnow/qemu/-/commits/hack-deprecate-union-branches/

Here's a quick hack that might expose that feature. I suppose we can discuss 
this with Markus and turn these into real patches if that's the direction we 
wanna head.



Hi! Markus is silent.. Maybe, you'll send patches ? )




He just went on PTO for 2 weeks :')

It's going to have to wait, I'm afraid ...



Hi!

Any plans or updates? John, may be you just send your patches?


--
Best regards,
Vladimir



Re: [PATCH v2 00/17] iotests: support zstd

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

Ping.

This applies to current master with a small obvious conflict in 
tests/qemu-iotests/common.filter

20.07.2021 14:38, Vladimir Sementsov-Ogievskiy wrote:

These series makes tests pass with

IMGOPTS='compression_type=zstd'

Also, python iotests start to support IMGOPTS (they didn't before).

v2:
01: add Max's r-b
02: simplify a lot: just add separate -o for IMGOPTS.
 also, don't bother with catching compat=0.10
03,04: new
05: add Max's r-b
06: one more explict compression_type=zlib
07: new
08: reduced to only update filter_img_info
09: adjust commit message, add comment, add Max's r-b
10: s/ = 0/ &= ~QCOW2_INCOMPAT_COMPRESSION/
 add Max's r-b
11-13: add Max's r-b
14: one more regex for "2, 3, 4" case
15-16: add Max's r-b
17: new

Vladimir Sementsov-Ogievskiy (17):
   iotests.py: img_info_log(): rename imgopts argument
   iotests.py: qemu_img*("create"): support
 IMGOPTS='compression_type=zstd'
   iotests: drop qemu_img_verbose() helper
   iotests.py: rewrite default luks support in qemu_img
   iotest 303: explicit compression type
   iotest 065: explicit compression type
   iotests.py: filter out successful output of qemu-img crate
   iotests.py: filter compression type out
   iotest 302: use img_info_log() helper
   qcow2: simple case support for downgrading of qcow2 images with zstd
   iotests/common.rc: introduce _qcow2_dump_header helper
   iotests: massive use _qcow2_dump_header
   iotest 39: use _qcow2_dump_header
   iotests: bash tests: filter compression type
   iotests 60: more accurate set dirty bit in qcow2 header
   iotest 214: explicit compression type
   iotests: declare lack of support for compresion_type in IMGOPTS

  block/qcow2.c| 58 +-
  tests/qemu-iotests/031   | 11 +++--
  tests/qemu-iotests/036   |  6 +--
  tests/qemu-iotests/039   | 22 -
  tests/qemu-iotests/044   |  5 +-
  tests/qemu-iotests/044.out   |  1 +
  tests/qemu-iotests/051   |  5 +-
  tests/qemu-iotests/060   | 22 -
  tests/qemu-iotests/060.out   |  2 +-
  tests/qemu-iotests/061   | 42 
  tests/qemu-iotests/061.out   | 12 ++---
  tests/qemu-iotests/065   | 16 +++---
  tests/qemu-iotests/082.out   | 14 +++---
  tests/qemu-iotests/112   |  3 +-
  tests/qemu-iotests/137   |  2 +-
  tests/qemu-iotests/198.out   |  4 +-
  tests/qemu-iotests/206.out   | 10 ++--
  tests/qemu-iotests/209   |  7 +--
  tests/qemu-iotests/209.out   |  2 +
  tests/qemu-iotests/210   |  8 +--
  tests/qemu-iotests/214   |  2 +-
  tests/qemu-iotests/242.out   | 10 ++--
  tests/qemu-iotests/255.out   |  4 --
  tests/qemu-iotests/274.out   | 39 ++-
  tests/qemu-iotests/280.out   |  1 -
  tests/qemu-iotests/287   |  8 +--
  tests/qemu-iotests/290   |  2 +-
  tests/qemu-iotests/302   |  4 +-
  tests/qemu-iotests/302.out   |  7 ++-
  tests/qemu-iotests/303   | 25 ++
  tests/qemu-iotests/303.out   | 30 +++-
  tests/qemu-iotests/common.filter |  8 +++
  tests/qemu-iotests/common.rc | 22 +
  tests/qemu-iotests/iotests.py| 84 
  34 files changed, 310 insertions(+), 188 deletions(-)




--
Best regards,
Vladimir



Re: [PATCH v8 28/34] iotests: move 222 to tests/image-fleecing

2021-09-01 Thread Daniel P . Berrangé
On Wed, Sep 01, 2021 at 02:37:52PM +0200, Hanna Reitz wrote:
> On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:
> > Give a good name to test file.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > Reviewed-by: Max Reitz 
> > ---
> >   tests/qemu-iotests/{222 => tests/image-fleecing} | 0
> >   tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
> >   2 files changed, 0 insertions(+), 0 deletions(-)
> >   rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
> >   rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} (100%)
> > 
> > diff --git a/tests/qemu-iotests/222 
> > b/tests/qemu-iotests/tests/image-fleecing
> > similarity index 100%
> > rename from tests/qemu-iotests/222
> > rename to tests/qemu-iotests/tests/image-fleecing
> > diff --git a/tests/qemu-iotests/222.out 
> > b/tests/qemu-iotests/tests/image-fleecing.out
> > similarity index 100%
> > rename from tests/qemu-iotests/222.out
> > rename to tests/qemu-iotests/tests/image-fleecing.out
> 
> Good news: Including error-report.h helped with most of the CI errors.
> 
> “Bad” news: .gitlab-ci.d/buildtest.yml has a complete ./check command line
> including test numbers...  Not sure if that’s a great idea, but in any case,
> this means that build-tcg-disabled fails because that command line includes
> 222.  I think the fix should be simply to replace 222 by image-fleecing.  I
> hope that’s alright for you?

Yeah, I don't really like that we have a set of test numbers in the
gitlab CI config file.

We have a 'group' concept for IO tests that works perfectly well.

If one of the existing groups doesn't work, then we should create
a new "ci" group, so users running 'check' directly can accurately
replicate the CI env without typiing out a set of test numbers.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v8 28/34] iotests: move 222 to tests/image-fleecing

2021-09-01 Thread Hanna Reitz

On 01.09.21 14:47, Vladimir Sementsov-Ogievskiy wrote:

01.09.2021 15:37, Hanna Reitz wrote:

On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:

Give a good name to test file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  tests/qemu-iotests/{222 => tests/image-fleecing} | 0
  tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
  2 files changed, 0 insertions(+), 0 deletions(-)
  rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
  rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} 
(100%)


diff --git a/tests/qemu-iotests/222 
b/tests/qemu-iotests/tests/image-fleecing

similarity index 100%
rename from tests/qemu-iotests/222
rename to tests/qemu-iotests/tests/image-fleecing
diff --git a/tests/qemu-iotests/222.out 
b/tests/qemu-iotests/tests/image-fleecing.out

similarity index 100%
rename from tests/qemu-iotests/222.out
rename to tests/qemu-iotests/tests/image-fleecing.out


Good news: Including error-report.h helped with most of the CI errors.

“Bad” news: .gitlab-ci.d/buildtest.yml has a complete ./check command 
line including test numbers...  Not sure if that’s a great idea, but 
in any case, this means that build-tcg-disabled fails because that 
command line includes 222.  I think the fix should be simply to 
replace 222 by image-fleecing.  I hope that’s alright for you?




Yes, that's OK, thanks

Unpredictable thing :( A reminder to always grep for file name when 
rename it..


Yeah, but in this case...  Grepping for three-digit-numbers is just a 
pain.  (And grepping for “check” is not much better.  “./check” works 
reasonably well, but if someone invokes it differently somewhere, I will 
have missed it.)


Hanna




Re: [PATCH for-6.2 v3 05/12] job: @force parameter for job_cancel_sync{,_all}()

2021-09-01 Thread Hanna Reitz

On 01.09.21 13:04, Vladimir Sementsov-Ogievskiy wrote:

06.08.2021 12:38, Max Reitz wrote:
@@ -726,7 +726,7 @@ static void replication_stop(ReplicationState 
*rs, bool failover, Error **errp)

   * disk, secondary disk in backup_job_completed().
   */
  if (s->backup_job) {
-    job_cancel_sync(&s->backup_job->job);
+    job_cancel_sync(&s->backup_job->job, false);


That's not quite correct, as backup is always force cancelled..


Good point.  I think functionally it shouldn’t make a difference, right? 
– but it’s better to be explicit about it and only use force=false where 
it actually makes a difference.


Hanna




Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-09-01 Thread Hanna Reitz

On 01.09.21 12:05, Vladimir Sementsov-Ogievskiy wrote:

06.08.2021 12:38, Max Reitz wrote:

Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
  job.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
    static void job_completed_txn_abort(Job *job)
  {
-    AioContext *outer_ctx = job->aio_context;
  AioContext *ctx;
  JobTxn *txn = job->txn;
  Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
  txn->aborting = true;
  job_txn_ref(txn);
  -    /* We can only hold the single job's AioContext lock while 
calling

+    /*
+ * We can only hold the single job's AioContext lock while calling
   * job_finalize_single() because the finalization callbacks can 
involve

- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
-    aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+    job_ref(job);
+    aio_context_release(job->aio_context);
    /* Other jobs are effectively cancelled by us, set the status 
for

   * them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
  }
  while (!QLIST_EMPTY(&txn->jobs)) {
  other_job = QLIST_FIRST(&txn->jobs);
+    /*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
  ctx = other_job->aio_context;
  aio_context_acquire(ctx);
  if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
  aio_context_release(ctx);
  }
  -    aio_context_acquire(outer_ctx);
+    /*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+    ctx = job->aio_context;
+    job_unref(job);
+    aio_context_acquire(ctx);



why to use ctx variable and not do it exactly same as in 
job_txn_apply() :


   aio_context_acquire(job->aio_context);
   job_unref(job);

?


Oh, I just didn’t think of that.  Sounds good, thanks!

Hanna


anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 






Re: [PATCH v8 28/34] iotests: move 222 to tests/image-fleecing

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

01.09.2021 15:37, Hanna Reitz wrote:

On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:

Give a good name to test file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  tests/qemu-iotests/{222 => tests/image-fleecing} | 0
  tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
  2 files changed, 0 insertions(+), 0 deletions(-)
  rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
  rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} (100%)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/tests/image-fleecing
similarity index 100%
rename from tests/qemu-iotests/222
rename to tests/qemu-iotests/tests/image-fleecing
diff --git a/tests/qemu-iotests/222.out 
b/tests/qemu-iotests/tests/image-fleecing.out
similarity index 100%
rename from tests/qemu-iotests/222.out
rename to tests/qemu-iotests/tests/image-fleecing.out


Good news: Including error-report.h helped with most of the CI errors.

“Bad” news: .gitlab-ci.d/buildtest.yml has a complete ./check command line 
including test numbers...  Not sure if that’s a great idea, but in any case, 
this means that build-tcg-disabled fails because that command line includes 
222.  I think the fix should be simply to replace 222 by image-fleecing.  I 
hope that’s alright for you?



Yes, that's OK, thanks

Unpredictable thing :( A reminder to always grep for file name when rename it..

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 05/12] job: @force parameter for job_cancel_sync{,_all}()

2021-09-01 Thread Hanna Reitz

On 01.09.21 12:20, Vladimir Sementsov-Ogievskiy wrote:

06.08.2021 12:38, Max Reitz wrote:

Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink:https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz
---
  include/qemu/job.h    | 10 ++---
  block/replication.c   |  4 +-
  blockdev.c    |  4 +-
  job.c | 20 +++--
  qemu-nbd.c    |  2 +-
  softmmu/runstate.c    |  2 +-
  storage-daemon/qemu-storage-daemon.c  |  2 +-
  tests/unit/test-block-iothread.c  |  2 +-
  tests/unit/test-blockjob.c    |  2 +-
  tests/qemu-iotests/109.out    | 60 +++
  tests/qemu-iotests/tests/qsd-jobs.out |  2 +-
  11 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 41162ed494..5e8edbc2c8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -506,19 +506,19 @@ void job_user_cancel(Job *job, bool force, 
Error **errp);

    /**
   * Synchronously cancel the @job.  The completion callback is called
- * before the function returns.  The job may actually complete
- * instead of canceling itself; the circumstances under which this
- * happens depend on the kind of job that is active.
+ * before the function returns.  If @force is false, the job may
+ * actually complete instead of canceling itself; the circumstances
+ * under which this happens depend on the kind of job that is active.
   *
   * Returns the return value from the job if the job actually completed
   * during the call, or -ECANCELED if it was canceled.
   *
   * Callers must hold the AioContext lock of job->aio_context.
   */
-int job_cancel_sync(Job *job);
+int job_cancel_sync(Job *job, bool force);
    /** Synchronously cancels all jobs using job_cancel_sync(). */
-void job_cancel_sync_all(void);
+void job_cancel_sync_all(bool force);


I think it would be better to keep job_cancel_sync_all(void) prototype 
and just change its behavior to do force-cancel. Anyway, this patch 
always pass true to it. And it would be strange to do soft-cancel-all, 
keeping in mind that soft cancelling only make sense for mirror in 
ready state.


Actually, yes, that’s true.  I’ll drop the parameter.

Hanna




Re: [PATCH v8 28/34] iotests: move 222 to tests/image-fleecing

2021-09-01 Thread Hanna Reitz

On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:

Give a good name to test file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  tests/qemu-iotests/{222 => tests/image-fleecing} | 0
  tests/qemu-iotests/{222.out => tests/image-fleecing.out} | 0
  2 files changed, 0 insertions(+), 0 deletions(-)
  rename tests/qemu-iotests/{222 => tests/image-fleecing} (100%)
  rename tests/qemu-iotests/{222.out => tests/image-fleecing.out} (100%)

diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/tests/image-fleecing
similarity index 100%
rename from tests/qemu-iotests/222
rename to tests/qemu-iotests/tests/image-fleecing
diff --git a/tests/qemu-iotests/222.out 
b/tests/qemu-iotests/tests/image-fleecing.out
similarity index 100%
rename from tests/qemu-iotests/222.out
rename to tests/qemu-iotests/tests/image-fleecing.out


Good news: Including error-report.h helped with most of the CI errors.

“Bad” news: .gitlab-ci.d/buildtest.yml has a complete ./check command 
line including test numbers...  Not sure if that’s a great idea, but in 
any case, this means that build-tcg-disabled fails because that command 
line includes 222.  I think the fix should be simply to replace 222 by 
image-fleecing.  I hope that’s alright for you?


Hanna




Re: [PATCH for-6.2 v3 11/12] mirror: Do not clear .cancelled

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Clearing .cancelled before leaving the main loop when the job has been
soft-cancelled is no longer necessary since job_is_cancelled() only
returns true for jobs that have been force-cancelled.

Therefore, this only makes a differences in places that call
job_cancel_requested().  In block/mirror.c, this is done only before
.cancelled was cleared.

In job.c, there are two callers:
- job_completed_txn_abort() asserts that .cancelled is true, so keeping
   it true will not affect this place.

- job_complete() refuses to let a job complete that has .cancelled set.
   It is correct to refuse to let the user invoke job-complete on mirror
   jobs that have already been soft-cancelled.

With this change, there are no places that reset .cancelled to false and
so we can be sure that .force_cancel can only be true of .cancelled is
true as well.  Assert this in job_is_cancelled().

Signed-off-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 10/12] mirror: Stop active mirroring after force-cancel

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Once the mirror job is force-cancelled (job_is_cancelled() is true), we
should not generate new I/O requests.  This applies to active mirroring,
too, so stop it once the job is cancelled.

(We must still forward all I/O requests to the source, though, of
course, but those are not really I/O requests generated by the job, so
this is fine.)

Signed-off-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 09/12] mirror: Check job_is_cancelled() earlier

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

Jobs can be cancelled while they yield, and once they are
(force-cancelled), they should not generate new I/O requests.
Therefore, we should put the check after the last yield before
mirror_iteration() is invoked.

Buglink:https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v8 09/34] block/backup: move cluster size calculation to block-copy

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

01.09.2021 14:57, Hanna Reitz wrote:

On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:

The main consumer of cluster-size is block-copy. Let's calculate it
here instead of passing through backup-top.

We are going to publish copy-before-write filter soon, so it will be
created through options. But we don't want for now to make explicit
option for cluster-size, let's continue to calculate it automatically.
So, now is the time to get rid of cluster_size argument for
bdrv_cbw_append().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  block/copy-before-write.h  |  1 -
  include/block/block-copy.h |  5 +--
  block/backup.c | 62 ++
  block/block-copy.c | 51 ++-
  block/copy-before-write.c  | 10 +++---
  5 files changed, 66 insertions(+), 63 deletions(-)


[...]


diff --git a/block/block-copy.c b/block/block-copy.c
index e83870ff81..b0e0a38330 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]


@@ -342,14 +343,57 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
  }
  }
+static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
+ Error **errp)
+{
+    int ret;
+    BlockDriverInfo bdi;
+    bool target_does_cow = bdrv_backing_chain_next(target);
+
+    /*
+ * If there is no backing file on the target, we cannot rely on COW if our
+ * backup cluster size is smaller than the target cluster size. Even for
+ * targets with a backing file, try to avoid COW if possible.
+ */
+    ret = bdrv_get_info(target, &bdi);
+    if (ret == -ENOTSUP && !target_does_cow) {
+    /* Cluster size is not defined */
+    warn_report("The target block device doesn't provide "
+    "information about the block size and it doesn't have a "
+    "backing file. The default block size of %u bytes is "
+    "used. If the actual block size of the target exceeds "
+    "this default, the backup may be unusable",
+    BLOCK_COPY_CLUSTER_SIZE_DEFAULT);


I get some build errors in the gitlab CI because of this – I think we need to 
add a qemu/error-report.h include in block/block-copy.c now.  (I don’t know why 
this works for some configurations, apparently, but not for others...)

Is it OK if I add it to this patch?


If it helps, than OK of course, thanks!



diff --git a/block/block-copy.c b/block/block-copy.c
index b0e0a38330..5d0076ac93 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -21,6 +21,7 @@
  #include "qemu/units.h"
  #include "qemu/coroutine.h"
  #include "block/aio_task.h"
+#include "qemu/error-report.h"

  #define BLOCK_COPY_MAX_COPY_RANGE (16 * MiB)
  #define BLOCK_COPY_MAX_BUFFER (1 * MiB)

Hanna




--
Best regards,
Vladimir



Re: [PATCH v8 09/34] block/backup: move cluster size calculation to block-copy

2021-09-01 Thread Hanna Reitz

On 24.08.21 10:38, Vladimir Sementsov-Ogievskiy wrote:

The main consumer of cluster-size is block-copy. Let's calculate it
here instead of passing through backup-top.

We are going to publish copy-before-write filter soon, so it will be
created through options. But we don't want for now to make explicit
option for cluster-size, let's continue to calculate it automatically.
So, now is the time to get rid of cluster_size argument for
bdrv_cbw_append().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
  block/copy-before-write.h  |  1 -
  include/block/block-copy.h |  5 +--
  block/backup.c | 62 ++
  block/block-copy.c | 51 ++-
  block/copy-before-write.c  | 10 +++---
  5 files changed, 66 insertions(+), 63 deletions(-)


[...]


diff --git a/block/block-copy.c b/block/block-copy.c
index e83870ff81..b0e0a38330 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]


@@ -342,14 +343,57 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
  }
  }
  
+static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,

+ Error **errp)
+{
+int ret;
+BlockDriverInfo bdi;
+bool target_does_cow = bdrv_backing_chain_next(target);
+
+/*
+ * If there is no backing file on the target, we cannot rely on COW if our
+ * backup cluster size is smaller than the target cluster size. Even for
+ * targets with a backing file, try to avoid COW if possible.
+ */
+ret = bdrv_get_info(target, &bdi);
+if (ret == -ENOTSUP && !target_does_cow) {
+/* Cluster size is not defined */
+warn_report("The target block device doesn't provide "
+"information about the block size and it doesn't have a "
+"backing file. The default block size of %u bytes is "
+"used. If the actual block size of the target exceeds "
+"this default, the backup may be unusable",
+BLOCK_COPY_CLUSTER_SIZE_DEFAULT);


I get some build errors in the gitlab CI because of this – I think we 
need to add a qemu/error-report.h include in block/block-copy.c now.  (I 
don’t know why this works for some configurations, apparently, but not 
for others...)


Is it OK if I add it to this patch?

diff --git a/block/block-copy.c b/block/block-copy.c
index b0e0a38330..5d0076ac93 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -21,6 +21,7 @@
 #include "qemu/units.h"
 #include "qemu/coroutine.h"
 #include "block/aio_task.h"
+#include "qemu/error-report.h"

 #define BLOCK_COPY_MAX_COPY_RANGE (16 * MiB)
 #define BLOCK_COPY_MAX_BUFFER (1 * MiB)

Hanna




Re: [PATCH for-6.2 v3 08/12] mirror: Use job_is_cancelled()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

mirror_drained_poll() returns true whenever the job is cancelled,
because "we [can] be sure that it won't issue more requests".  However,
this is only true for force-cancelled jobs, so use job_is_cancelled().

Signed-off-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 07/12] job: Add job_cancel_requested()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_requested() as the general variant, which returns true for
any jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Finally, here is a justification for how different job_is_cancelled()
invocations are treated by this patch:

- block/mirror.c (mirror_run()):
   - The first invocation is a while loop that should loop until the job
 has been cancelled or scheduled for completion.  What kind of cancel
 does not matter, only the fact that the job is supposed to end.

   - The second invocation wants to know whether the job has been
 soft-cancelled.  Calling job_cancel_requested() is a bit too broad,
 but if the job were force-cancelled, we should leave the main loop
 as soon as possible anyway, so this should not matter here.

   - The last two invocations already check force_cancel, so they should
 continue to use job_is_cancelled().

- block/backup.c, block/commit.c, block/stream.c, anything in tests/:
   These jobs know only force-cancel, so there is no difference between
   job_is_cancelled() and job_cancel_requested().  We can continue using
   job_is_cancelled().

- job.c:
   - job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled
 jobs should be prevented from being paused.  Continue using 
job_is_cancelled().

   - job_update_rc(), job_finalize_single(), job_finish_sync(): These
 functions are all called after the job has left its main loop.  The
 mirror job (the only job that can be soft-cancelled) will clear
 .cancelled before leaving the main loop if it has been
 soft-cancelled.  Therefore, these functions will observe .cancelled
 to be true only if the job has been force-cancelled.  We can
 continue to use job_is_cancelled().
 (Furthermore, conceptually, a soft-cancelled mirror job should not
 report to have been cancelled.  It should report completion (see
 also the block-job-cancel QAPI documentation).  Therefore, it makes
 sense for these functions not to distinguish between a
 soft-cancelled mirror job and a job that has completed as normal.)

   - job_completed_txn_abort(): All jobs other than @job have been
 force-cancelled.  job_is_cancelled() must be true for them.
 Regarding @job itself: job_completed_txn_abort() is mostly called
 when the job's return value is not 0.  A soft-cancelled mirror has a
 return value of 0, and so will not end up here then.
 However, job_cancel() invokes job_completed_txn_abort() if the job
 has been deferred to the main loop, which is mostly the case for
 completed jobs (which skip the assertion), but not for sure.
 To be safe, use job_cancel_requested() in this assertion.

   - job_complete(): This is function eventually invoked by the user
 (through qmp_block_job_complete() or qmp_job_complete(), or
 job_complete_sync(), which comes from qemu-img).  The intention here
 is to prevent a user from invoking job-complete after the job has
 been cancelled.  This should also apply to soft cancelling: After a
 mirror job has been soft-cancelled, the user should not be able to
 decide otherwise and have it complete as normal (i.e. pivoting to
 the target).

Buglink:https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v5] block/file-win32: add reopen handlers

2021-09-01 Thread Hanna Reitz

On 25.08.21 19:36, Viktor Prutyanov wrote:

Make 'qemu-img commit' work on Windows.

Command 'commit' requires reopening backing file in RW mode. So,
add reopen prepare/commit/abort handlers and change dwShareMode
for CreateFile call in order to allow further read/write reopening.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418

Suggested-by: Hanna Reitz 
Signed-off-by: Viktor Prutyanov 
Tested-by: Helge Konetzka 
---
  v2:
 - fix indentation in raw_reopen_prepare
 - free rs if raw_reopen_prepare fails
  v3:
 - restore suggested-by field missed in v2
  v4:
 - add file type check
 - add comment about options
 - replace rs check with assert in raw_reopen_commit
  v5:
 - add CloseHandle at AIO attach fail

  block/file-win32.c | 101 -
  1 file changed, 100 insertions(+), 1 deletion(-)


Thanks!  I’ve applied this patch to my block branch:

https://github.com/XanClic/qemu/commits/block

Hanna




Re: [PATCH for-6.2 v3 05/12] job: @force parameter for job_cancel_sync{,_all}()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

@@ -726,7 +726,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
   * disk, secondary disk in backup_job_completed().
   */
  if (s->backup_job) {
-job_cancel_sync(&s->backup_job->job);
+job_cancel_sync(&s->backup_job->job, false);


That's not quite correct, as backup is always force cancelled..

--
Best regards,
Vladimir



Re: [PATCH] block/export/fuse.c: fix fuse-lseek on uclibc or musl

2021-09-01 Thread Hanna Reitz

On 28.08.21 00:03, Fabrice Fontaine wrote:

Include linux/fs.h to avoid the following build failure on uclibc or
musl raised since version 6.0.0:

../block/export/fuse.c: In function 'fuse_lseek':
../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this 
function)
   641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
   |   ^
../block/export/fuse.c:641:19: note: each undeclared identifier is reported 
only once for each function it appears in
../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this 
function); did you mean 'SEEK_SET'?
   641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
   |  ^
   |  SEEK_SET

Fixes:
  - 
http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137

Signed-off-by: Fabrice Fontaine 
---
  block/export/fuse.c | 3 +++
  1 file changed, 3 insertions(+)


Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Hanna




Re: [PATCH for-6.2 v3 05/12] job: @force parameter for job_cancel_sync{,_all}()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink:https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz
---
  include/qemu/job.h| 10 ++---
  block/replication.c   |  4 +-
  blockdev.c|  4 +-
  job.c | 20 +++--
  qemu-nbd.c|  2 +-
  softmmu/runstate.c|  2 +-
  storage-daemon/qemu-storage-daemon.c  |  2 +-
  tests/unit/test-block-iothread.c  |  2 +-
  tests/unit/test-blockjob.c|  2 +-
  tests/qemu-iotests/109.out| 60 +++
  tests/qemu-iotests/tests/qsd-jobs.out |  2 +-
  11 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 41162ed494..5e8edbc2c8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -506,19 +506,19 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  
  /**

   * Synchronously cancel the @job.  The completion callback is called
- * before the function returns.  The job may actually complete
- * instead of canceling itself; the circumstances under which this
- * happens depend on the kind of job that is active.
+ * before the function returns.  If @force is false, the job may
+ * actually complete instead of canceling itself; the circumstances
+ * under which this happens depend on the kind of job that is active.
   *
   * Returns the return value from the job if the job actually completed
   * during the call, or -ECANCELED if it was canceled.
   *
   * Callers must hold the AioContext lock of job->aio_context.
   */
-int job_cancel_sync(Job *job);
+int job_cancel_sync(Job *job, bool force);
  
  /** Synchronously cancels all jobs using job_cancel_sync(). */

-void job_cancel_sync_all(void);
+void job_cancel_sync_all(bool force);


I think it would be better to keep job_cancel_sync_all(void) prototype and just 
change its behavior to do force-cancel. Anyway, this patch always pass true to 
it. And it would be strange to do soft-cancel-all, keeping in mind that soft 
cancelling only make sense for mirror in ready state.

Anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 01/12] job: Context changes in job_completed_txn_abort()

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

Finalizing the job may cause its AioContext to change.  This is noted by
job_exit(), which points at job_txn_apply() to take this fact into
account.

However, job_completed() does not necessarily invoke job_txn_apply()
(through job_completed_txn_success()), but potentially also
job_completed_txn_abort().  The latter stores the context in a local
variable, and so always acquires the same context at its end that it has
released in the beginning -- which may be a different context from the
one that job_exit() releases at its end.  If it is different, qemu
aborts ("qemu_mutex_unlock_impl: Operation not permitted").

Drop the local @outer_ctx variable from job_completed_txn_abort(), and
instead re-acquire the actual job's context at the end of the function,
so job_exit() will release the same.

Signed-off-by: Max Reitz 
---
  job.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/job.c b/job.c
index e7a5d28854..3fe23bb77e 100644
--- a/job.c
+++ b/job.c
@@ -737,7 +737,6 @@ static void job_cancel_async(Job *job, bool force)
  
  static void job_completed_txn_abort(Job *job)

  {
-AioContext *outer_ctx = job->aio_context;
  AioContext *ctx;
  JobTxn *txn = job->txn;
  Job *other_job;
@@ -751,10 +750,14 @@ static void job_completed_txn_abort(Job *job)
  txn->aborting = true;
  job_txn_ref(txn);
  
-/* We can only hold the single job's AioContext lock while calling

+/*
+ * We can only hold the single job's AioContext lock while calling
   * job_finalize_single() because the finalization callbacks can involve
- * calls of AIO_WAIT_WHILE(), which could deadlock otherwise. */
-aio_context_release(outer_ctx);
+ * calls of AIO_WAIT_WHILE(), which could deadlock otherwise.
+ * Note that the job's AioContext may change when it is finalized.
+ */
+job_ref(job);
+aio_context_release(job->aio_context);
  
  /* Other jobs are effectively cancelled by us, set the status for

   * them; this job, however, may or may not be cancelled, depending
@@ -769,6 +772,10 @@ static void job_completed_txn_abort(Job *job)
  }
  while (!QLIST_EMPTY(&txn->jobs)) {
  other_job = QLIST_FIRST(&txn->jobs);
+/*
+ * The job's AioContext may change, so store it in @ctx so we
+ * release the same context that we have acquired before.
+ */
  ctx = other_job->aio_context;
  aio_context_acquire(ctx);
  if (!job_is_completed(other_job)) {
@@ -779,7 +786,13 @@ static void job_completed_txn_abort(Job *job)
  aio_context_release(ctx);
  }
  
-aio_context_acquire(outer_ctx);

+/*
+ * Use job_ref()/job_unref() so we can read the AioContext here
+ * even if the job went away during job_finalize_single().
+ */
+ctx = job->aio_context;
+job_unref(job);
+aio_context_acquire(ctx);



why to use ctx variable and not do it exactly same as in job_txn_apply() :

   aio_context_acquire(job->aio_context);
   job_unref(job);

?

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.2 v3 04/12] job: Force-cancel jobs in a failed transaction

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

06.08.2021 12:38, Max Reitz wrote:

When a transaction is aborted, no result matters, and so all jobs within
should be force-cancelled.

Signed-off-by: Max Reitz 
---
  job.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/job.c b/job.c
index 3fe23bb77e..24e7c4fcb7 100644
--- a/job.c
+++ b/job.c
@@ -766,7 +766,12 @@ static void job_completed_txn_abort(Job *job)
  if (other_job != job) {
  ctx = other_job->aio_context;
  aio_context_acquire(ctx);
-job_cancel_async(other_job, false);
+/*
+ * This is a transaction: If one job failed, no result will matter.
+ * Therefore, pass force=true to terminate all other jobs as 
quickly
+ * as possible.
+ */
+job_cancel_async(other_job, true);
  aio_context_release(ctx);
  }
  }



Anyway, only backup jobs may be in a transaction, which doesn't distinguish 
force and soft cancelling. So, that doesn't change any logic.

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

27.08.2021 21:45, Eric Blake wrote:

On Fri, Aug 27, 2021 at 07:58:10PM +0300, Vladimir Sementsov-Ogievskiy wrote:

27.08.2021 18:09, Eric Blake wrote:

According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu because our block layer serializes any overlapping
operations (see bdrv_find_conflicting_request and friends)


Not any. We serialize only write operations not aligned to request_alignment of 
bs (see bdrv_make_request_serialising() call in bdrv_co_pwritev_part). So, 
actually most of overlapping operations remain overlapping. And that's correct: 
it's not a Qemu work to resolve overlapping requests. We resolve them only when 
we are responsible for appearing of intersection: when we align requests up.


I welcome improvements on the wording.  Maybe what I should be
emphasizing is that even when there are overlapping requests, qemu
itself is multiplexing all of those requests through a single
interface into the backend, without any caching on qemu's part, and
relying on the consistency of the flush operation into that backend.

 From a parallelism perspective, in file-posix.c, we don't distiguish
between two pwrite() syscalls made (potentially out-of-order) by a
single BDS client in two coroutines, from two pwrite() syscalls made
by two separate BDS clients.  Either way, those two syscalls may both
be asynchronous, but both go through a single interface into the
kernel's view of the underlying filesystem or block device.  And we
implement flush via fdatasync(), which the kernel already has some
pretty strong guarantees on cross-thread consistency.

But I am less certain of whether we are guaranteed cross-consistency
like that for all protocol drivers.  Is there any block driver (most
likely a networked one) where we have situations such that even though
we are using the same API for all asynchronous access within the qemu
coroutines, under the hood those APIs can end up diverging on their
destinations such as due to network round-robin effects, and result in
us seeing cache-inconsistent views?  That is, can we ever encounter
this:

-> read()
   -> kicks off networked storage call that resolves to host X
 -> host X caches the read
   <- reply
-> write()
   -> kicks off networked storage call that resolves to host Y
 -> host Y updates the file system
   <- reply
-> flush()
   -> kicks off networked storage call that resolves to host Y
 -> host Y starts flushing, but replies early
   <- reply
-> read()
   -> kicks off networked storage call that resolves to host X
 -> host X does not see effects of Y's flush yet, returns stale data

If we can encounter that, then in those situations we must not
advertise MULTI_CONN.  But I'm confident that file-posix.c does not
have that problem, and even if another driver did have that problem
(where our single API access can result in cache-inconsistent views
over the protocol, rather than flush really being effective to all
further API access to that driver), you'd think we'd be aware of it.
However, if we DO know of a place where that is the case, then now is
the time to design our QAPI control over whether to advertise NBD's
MULTI_CONN bit based on whether the block layer can warn us about a
particular block layer NOT being safe.

But unless we come up with such a scenario, maybe all I need here is
better wording to put in the commit message to state why we think we
ARE safe in advertising MULTI_CONN.  Remember, the NBD flag only has
an impact in relation to how strong flush calls are (it is NOT
required that overlapping write requests have any particular behavior
- that's always been up to the client to be careful with that, and
qemu need not go out of its way to prevent client stupidity with
overlapping writes), but rather that actions with a reply completed
prior to FLUSH are then visible to actions started after the reply to
FLUSH.



Reasonable, I agree

--
Best regards,
Vladimir



Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Daniel P . Berrangé
On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> > > 
> > > Change the send_write() interface of multifd, allowing it to pass down
> > > flags for qio_channel_write*().
> > > 
> > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > other data being sent at the default copying approach.
> > > 
> > > Signed-off-by: Leonardo Bras 
> > > ---
> > >  migration/multifd-zlib.c | 7 ---
> > >  migration/multifd-zstd.c | 7 ---
> > >  migration/multifd.c  | 9 ++---
> > >  migration/multifd.h  | 3 ++-
> > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > 
> > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > >  }
> > >  
> > >  if (used) {
> > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > &local_err);
> > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > MSG_ZEROCOPY,
> > > +  &local_err);
> > 
> > I don't think it is valid to unconditionally enable this feature due to the
> > resource usage implications
> > 
> > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > 
> >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> >if the socket option was not set, the socket exceeds its optmem 
> >limit or the user exceeds its ulimit on locked pages."
> > 
> > The limit on locked pages is something that looks very likely to be
> > exceeded unless you happen to be running a QEMU config that already
> > implies locked memory (eg PCI assignment)
> 
> Yes it would be great to be a migration capability in parallel to multifd. At
> initial phase if it's easy to be implemented on multi-fd only, we can add a
> dependency between the caps.  In the future we can remove that dependency when
> the code is ready to go without multifd.  Thanks,

Also, I'm wondering how zerocopy support interacts with kernel support
for kTLS and multipath-TCP, both of which we want to be able to use
with migration.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 2/3] io: Add zerocopy and errqueue

2021-09-01 Thread Daniel P . Berrangé
On Tue, Aug 31, 2021 at 04:27:04PM -0400, Peter Xu wrote:
> On Tue, Aug 31, 2021 at 01:57:33PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 08:02:38AM -0300, Leonardo Bras wrote:
> > > MSG_ZEROCOPY is a feature that enables copy avoidance in TCP/UDP socket
> > > send calls. It does so by avoiding copying user data into kernel buffers.
> > > 
> > > To make it work, three steps are needed:
> > > 1 - A setsockopt() system call, enabling SO_ZEROCOPY
> > > 2 - Passing down the MSG_ZEROCOPY flag for each send*() syscall
> > > 3 - Process the socket's error queue, dealing with any error
> > 
> > AFAICT, this is missing the single most critical aspect of MSG_ZEROCOPY.
> > 
> > It is non-obvious, but setting the MSG_ZEROCOPY flag turns sendmsg()
> > from a synchronous call to an asynchronous call.
> > 
> > It is forbidden to overwrite/reuse/free the buffer passed to sendmsg
> > until an asynchronous completion notification has been received from
> > the socket error queue. These notifications are not required to
> > arrive in-order, even for a TCP stream, because the kernel hangs on
> > to the buffer if a re-transmit is needed.
> > 
> > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > 
> >   "Page pinning also changes system call semantics. It temporarily 
> >shares the buffer between process and network stack. Unlike with
> >copying, the process cannot immediately overwrite the buffer 
> >after system call return without possibly modifying the data in 
> >flight. Kernel integrity is not affected, but a buggy program
> >can possibly corrupt its own data stream."
> > 
> > AFAICT, the design added in this patch does not provide any way
> > to honour these requirements around buffer lifetime.
> > 
> > I can't see how we can introduce MSG_ZEROCOPY in any seemless
> > way. The buffer lifetime requirements imply need for an API
> > design that is fundamentally different for asynchronous usage,
> > with a callback to notify when the write has finished/failed.
> 
> Regarding buffer reuse - it indeed has a very deep implication on the buffer
> being available and it's not obvious at all.  Just to mention that the initial
> user of this work will make sure all zero copy buffers will be guest pages 
> only
> (as it's only used in multi-fd), so they should always be there during the
> process.

That is not the case when migration is using TLS, because the buffers
transmitted are the ciphertext buffer, not the plaintext guest page.

> In short, we may just want to at least having a way to make sure all zero
> copied buffers are finished using and they're sent after some function returns
> (e.g., qio_channel_flush()).  That may require us to do some accounting on 
> when
> we called sendmsg(MSG_ZEROCOPY), meanwhile we should need to read out the
> ee_data field within SO_EE_ORIGIN_ZEROCOPY msg when we do recvmsg() for the
> error queue and keep those information somewhere too.
> 
> Some other side notes that reached my mind..
> 
> The qio_channel_writev_full() may not be suitable for async operations, as the
> name "full" implies synchronous to me.  So maybe we can add a new helper for
> zero copy on the channel?

All the APIs that exist today are fundamentally only suitable for sync
operations. Supporting async correctly will definitely a brand new APIs
separate from what exists today.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-09-01 Thread Vladimir Sementsov-Ogievskiy

27.08.2021 18:09, Eric Blake wrote:

According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu because our block layer serializes any overlapping
operations (see bdrv_find_conflicting_request and friends): no matter
which client performs a flush, parallel requests coming from distinct
NBD clients will still be well-ordered by the time they are passed on
to the underlying device, with no caching in qemu proper to allow
stale results to leak after a flush.

We don't want to advertise MULTI_CONN when we know that a second
client can connect (which is the default for qemu-nbd, but not for QMP
nbd-server-add), so it does require a QAPI addition.  But other than
that, the actual change to advertise the bit for writable servers is
fairly small.  The harder part of this patch is setting up an iotest
to demonstrate behavior of multiple NBD clients to a single server.
It might be possible with parallel qemu-io processes, but concisely
managing that in shell is painful.  I found it easier to do by relying
on the libnbd project's nbdsh, which means this test will be skipped
on platforms where that is not available.

Signed-off-by: Eric Blake 
---
  docs/interop/nbd.txt   |  1 +
  docs/tools/qemu-nbd.rst|  3 +-
  qapi/block-export.json |  6 +-
  blockdev-nbd.c |  4 +
  nbd/server.c   | 10 +--
  qemu-nbd.c |  2 +
  MAINTAINERS|  1 +
  tests/qemu-iotests/tests/nbd-multiconn | 91 ++
  tests/qemu-iotests/tests/nbd-multiconn.out | 14 
  9 files changed, 124 insertions(+), 8 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
  create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 10ce098a29bf..d03910f1e9eb 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
  * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
  NBD_CMD_FLAG_FAST_ZERO
  * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 6.2: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 5643da26e982..81be32164a55 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -138,8 +138,7 @@ driver options if ``--image-opts`` is specified.
  .. option:: -e, --shared=NUM

Allow up to *NUM* clients to share the device (default
-  ``1``), 0 for unlimited. Safe for readers, but for now,
-  consistency is not guaranteed between multiple writers.
+  ``1``), 0 for unlimited.

  .. option:: -t, --persistent

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 0ed63442a819..b2085a9fdd4c 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -95,11 +95,15 @@
  #the metadata context name "qemu:allocation-depth" to
  #inspect allocation details. (since 5.2)
  #
+# @shared: True if the server should advertise that multiple clients may
+#  connect, default false. (since 6.2)
+#
  # Since: 5.2
  ##
  { 'struct': 'BlockExportOptionsNbd',
'base': 'BlockExportOptionsNbdBase',
-  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } }
+  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool',
+ '*shared': 'bool' } }

  ##
  # @BlockExportOptionsVhostUserBlk:
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index bdfa7ed3a5a9..258feaa76e02 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -221,6 +221,10 @@ void qmp_nbd_server_add(NbdServerAddOptions *arg, Error 
**errp)
  QAPI_LIST_PREPEND(export_opts->u.nbd.bitmaps, g_strdup(arg->bitmap));
  }

+/* nbd-server-add always permits parallel clients */
+export_opts->u.nbd.has_shared = true;
+export_opts->u.nbd.shared = true;


Hmm, I don't follow.. Before the patch we allowed multicon only for readonly 
exports. Now with nbd-server-add we allow it for any kind of export?


+
  /*
   * nbd-server-add doesn't complain when a read-only device should be
   * exported as writable, but simply downgrades it. This is an error with
diff --git a/nbd/server.c b/nbd/server.c
index 3927f7789dcf..1646796a4798 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
  /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2021 Red Hat, Inc.
   *  Copyright (C) 2005  Anthony Liguori 
   *
   *  Network Block Device Server Side
@@ -1634,7 +1634,7 @@ static int nbd_export_create(BlockExport *blk_exp, 
BlockExportOptions *exp_args,
  int64_