Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-08 Thread Jason Wang
On Wed, Sep 8, 2021 at 11:19 PM Peter Xu  wrote:
>
> On Wed, Sep 08, 2021 at 09:19:27AM +0100, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasow...@redhat.com) wrote:
> > > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> > > >
> > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > > wrote:
> > > > > > I don't think it is valid to unconditionally enable this feature 
> > > > > > due to the
> > > > > > resource usage implications
> > > > > >
> > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > >
> > > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This 
> > > > > > happens
> > > > > >if the socket option was not set, the socket exceeds its optmem
> > > > > >limit or the user exceeds its ulimit on locked pages."
> > > > >
> > > > > You are correct, I unfortunately missed this part in the docs :(
> > > > >
> > > > > > The limit on locked pages is something that looks very likely to be
> > > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > > implies locked memory (eg PCI assignment)
> > > > >
> > > > > Do you mean the limit an user has on locking memory?
> > > > >
> > > > > If so, that makes sense. I remember I needed to set the upper limit 
> > > > > of locked
> > > > > memory for the user before using it, or adding a capability to qemu 
> > > > > before.
> > > >
> > > > So I'm a bit confused on why MSG_ZEROCOPY requires checking 
> > > > RLIMIT_MEMLOCK.
> > > >
> > > > The thing is IIUC that's accounting for pinned pages only with either 
> > > > mlock()
> > > > (FOLL_MLOCK) or vfio (FOLL_PIN).
> > > >
> > > > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking 
> > > > at
> > > > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> > >
> > > It happens probably here:
> > >
> > > E.g
> > >
> > > __ip_append_data()
> > > msg_zerocopy_realloc()
> > > mm_account_pinned_pages()
> >
> > When do they get uncounted?  i.e. is it just the data that's in flight
> > that's marked as pinned?
>
> I think so - there's __msg_zerocopy_callback() -> mm_unaccount_pinned_pages()
> correspondingly.  Thanks,

Yes, and actually the memory that could be pinned is limited by the
sndbuf of TCP socket. So we are fine with rlimit (e.g we don't need to
pin all guest pages).

Thanks

>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-08 Thread Daniel P . Berrangé
On Tue, Sep 07, 2021 at 12:13:28PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > On Wed, Sep 01, 2021 at 11:35:33AM -0400, Peter Xu wrote:
> > > On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > > > > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > > > thread.
> > > > > > > 
> > > > > > > Change the send_write() interface of multifd, allowing it to pass 
> > > > > > > down
> > > > > > > flags for qio_channel_write*().
> > > > > > > 
> > > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while 
> > > > > > > keeping the
> > > > > > > other data being sent at the default copying approach.
> > > > > > > 
> > > > > > > Signed-off-by: Leonardo Bras 
> > > > > > > ---
> > > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > > >  migration/multifd.c  | 9 ++---
> > > > > > >  migration/multifd.h  | 3 ++-
> > > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > > > 
> > > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > > > >  }
> > > > > > >  
> > > > > > >  if (used) {
> > > > > > > -ret = multifd_send_state->ops->send_write(p, 
> > > > > > > used, _err);
> > > > > > > +ret = multifd_send_state->ops->send_write(p, 
> > > > > > > used, MSG_ZEROCOPY,
> > > > > > > +  
> > > > > > > _err);
> > > > > > 
> > > > > > I don't think it is valid to unconditionally enable this feature 
> > > > > > due to the
> > > > > > resource usage implications
> > > > > > 
> > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > > 
> > > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This 
> > > > > > happens 
> > > > > >if the socket option was not set, the socket exceeds its optmem 
> > > > > >limit or the user exceeds its ulimit on locked pages."
> > > > > > 
> > > > > > The limit on locked pages is something that looks very likely to be
> > > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > > implies locked memory (eg PCI assignment)
> > > > > 
> > > > > Yes it would be great to be a migration capability in parallel to 
> > > > > multifd. At
> > > > > initial phase if it's easy to be implemented on multi-fd only, we can 
> > > > > add a
> > > > > dependency between the caps.  In the future we can remove that 
> > > > > dependency when
> > > > > the code is ready to go without multifd.  Thanks,
> > > > 
> > > > Also, I'm wondering how zerocopy support interacts with kernel support
> > > > for kTLS and multipath-TCP, both of which we want to be able to use
> > > > with migration.
> > > 
> > > Copying Jason Wang for net implications between these features on kernel 
> > > side
> > > and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).
> > > 
> > > From the safe side we may want to only enable one of them until we prove
> > > they'll work together I guess..
> > 
> > MPTCP is good when we're network limited for migration
> > 
> > KTLS will be good when we're CPU limited on AES for migration,
> > which is essentially always when TLS is used.
> > 
> > ZEROCOPY will be good when we're CPU limited for data copy
> > on migration, or to reduce the impact on other concurrent
> > VMs on the same CPUs.
> > 
> > Ultimately we woudld benefit from all of them at the same
> > time, if it were technically possible todo.
> 
> I think last time I spoke to Paolo Abeni there were some interactions
> between them; I can't remember what though (I think mptcp and ktls
> didn't play at the time).

MPTCP and KTLS use the same kernel hook in the network layer and
only 1 hook is permitted at a time, making them mutually exclusive
for now. In theory this can be fixed but no one is actively working
on it yet.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-08 Thread Peter Xu
On Wed, Sep 08, 2021 at 09:19:27AM +0100, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasow...@redhat.com) wrote:
> > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> > >
> > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > >
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > > >if the socket option was not set, the socket exceeds its optmem
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > >
> > > > You are correct, I unfortunately missed this part in the docs :(
> > > >
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > >
> > > > Do you mean the limit an user has on locking memory?
> > > >
> > > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > > locked
> > > > memory for the user before using it, or adding a capability to qemu 
> > > > before.
> > >
> > > So I'm a bit confused on why MSG_ZEROCOPY requires checking 
> > > RLIMIT_MEMLOCK.
> > >
> > > The thing is IIUC that's accounting for pinned pages only with either 
> > > mlock()
> > > (FOLL_MLOCK) or vfio (FOLL_PIN).
> > >
> > > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> > > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> > 
> > It happens probably here:
> > 
> > E.g
> > 
> > __ip_append_data()
> > msg_zerocopy_realloc()
> > mm_account_pinned_pages()
> 
> When do they get uncounted?  i.e. is it just the data that's in flight
> that's marked as pinned?

I think so - there's __msg_zerocopy_callback() -> mm_unaccount_pinned_pages()
correspondingly.  Thanks,

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-08 Thread Dr. David Alan Gilbert
* Jason Wang (jasow...@redhat.com) wrote:
> On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> >
> > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > > > I don't think it is valid to unconditionally enable this feature due to 
> > > > the
> > > > resource usage implications
> > > >
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > >
> > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > >if the socket option was not set, the socket exceeds its optmem
> > > >limit or the user exceeds its ulimit on locked pages."
> > >
> > > You are correct, I unfortunately missed this part in the docs :(
> > >
> > > > The limit on locked pages is something that looks very likely to be
> > > > exceeded unless you happen to be running a QEMU config that already
> > > > implies locked memory (eg PCI assignment)
> > >
> > > Do you mean the limit an user has on locking memory?
> > >
> > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > locked
> > > memory for the user before using it, or adding a capability to qemu 
> > > before.
> >
> > So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK.
> >
> > The thing is IIUC that's accounting for pinned pages only with either 
> > mlock()
> > (FOLL_MLOCK) or vfio (FOLL_PIN).
> >
> > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> 
> It happens probably here:
> 
> E.g
> 
> __ip_append_data()
> msg_zerocopy_realloc()
> mm_account_pinned_pages()

When do they get uncounted?  i.e. is it just the data that's in flight
that's marked as pinned?

Dave

> Thanks
> 
> >
> > Say, I think there're plenty of iov_iter_get_pages() callers and normal 
> > GUPs, I
> > think most of them do no need such accountings.
> >
> > --
> > Peter Xu
> >
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Jason Wang
On Wed, Sep 8, 2021 at 11:24 AM Peter Xu  wrote:
>
> On Wed, Sep 08, 2021 at 10:59:57AM +0800, Jason Wang wrote:
> > On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> > >
> > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > >
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > > >if the socket option was not set, the socket exceeds its optmem
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > >
> > > > You are correct, I unfortunately missed this part in the docs :(
> > > >
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > >
> > > > Do you mean the limit an user has on locking memory?
> > > >
> > > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > > locked
> > > > memory for the user before using it, or adding a capability to qemu 
> > > > before.
> > >
> > > So I'm a bit confused on why MSG_ZEROCOPY requires checking 
> > > RLIMIT_MEMLOCK.
> > >
> > > The thing is IIUC that's accounting for pinned pages only with either 
> > > mlock()
> > > (FOLL_MLOCK) or vfio (FOLL_PIN).
> > >
> > > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> > > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> >
> > It happens probably here:
> >
> > E.g
> >
> > __ip_append_data()
> > msg_zerocopy_realloc()
> > mm_account_pinned_pages()
>
> Right. :)
>
> But my previous question is more about the reason behind it - I thought that's
> a common GUP and it shouldn't rely on locked_vm because it should still be
> temporary GUP rather than a long time GUP, IMHO that's how we use locked_vm 
> (we
> don't account for temp GUPs but only longterm ones). IOW, I'm wondering 
> whether
> all the rest of iov_iter_get_pages() callers should check locked_vm too, and
> AFAIU they're not doing that right now..

You are probably right, but I'm not sure if it's too late to fix that.
(E.g it will break existing userspace)

Thanks

>
> Thanks,
>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Peter Xu
On Wed, Sep 08, 2021 at 10:59:57AM +0800, Jason Wang wrote:
> On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
> >
> > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > > > I don't think it is valid to unconditionally enable this feature due to 
> > > > the
> > > > resource usage implications
> > > >
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > >
> > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > >if the socket option was not set, the socket exceeds its optmem
> > > >limit or the user exceeds its ulimit on locked pages."
> > >
> > > You are correct, I unfortunately missed this part in the docs :(
> > >
> > > > The limit on locked pages is something that looks very likely to be
> > > > exceeded unless you happen to be running a QEMU config that already
> > > > implies locked memory (eg PCI assignment)
> > >
> > > Do you mean the limit an user has on locking memory?
> > >
> > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > locked
> > > memory for the user before using it, or adding a capability to qemu 
> > > before.
> >
> > So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK.
> >
> > The thing is IIUC that's accounting for pinned pages only with either 
> > mlock()
> > (FOLL_MLOCK) or vfio (FOLL_PIN).
> >
> > I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> > __zerocopy_sg_from_iter() -> iov_iter_get_pages().
> 
> It happens probably here:
> 
> E.g
> 
> __ip_append_data()
> msg_zerocopy_realloc()
> mm_account_pinned_pages()

Right. :)

But my previous question is more about the reason behind it - I thought that's
a common GUP and it shouldn't rely on locked_vm because it should still be
temporary GUP rather than a long time GUP, IMHO that's how we use locked_vm (we
don't account for temp GUPs but only longterm ones). IOW, I'm wondering whether
all the rest of iov_iter_get_pages() callers should check locked_vm too, and
AFAIU they're not doing that right now..

Thanks,

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Jason Wang
On Wed, Sep 8, 2021 at 2:32 AM Peter Xu  wrote:
>
> On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > > I don't think it is valid to unconditionally enable this feature due to 
> > > the
> > > resource usage implications
> > >
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > >
> > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > >if the socket option was not set, the socket exceeds its optmem
> > >limit or the user exceeds its ulimit on locked pages."
> >
> > You are correct, I unfortunately missed this part in the docs :(
> >
> > > The limit on locked pages is something that looks very likely to be
> > > exceeded unless you happen to be running a QEMU config that already
> > > implies locked memory (eg PCI assignment)
> >
> > Do you mean the limit an user has on locking memory?
> >
> > If so, that makes sense. I remember I needed to set the upper limit of 
> > locked
> > memory for the user before using it, or adding a capability to qemu before.
>
> So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK.
>
> The thing is IIUC that's accounting for pinned pages only with either mlock()
> (FOLL_MLOCK) or vfio (FOLL_PIN).
>
> I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
> __zerocopy_sg_from_iter() -> iov_iter_get_pages().

It happens probably here:

E.g

__ip_append_data()
msg_zerocopy_realloc()
mm_account_pinned_pages()

Thanks

>
> Say, I think there're plenty of iov_iter_get_pages() callers and normal GUPs, 
> I
> think most of them do no need such accountings.
>
> --
> Peter Xu
>




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Peter Xu
On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > I don't think it is valid to unconditionally enable this feature due to the
> > resource usage implications
> >
> > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> >
> >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> >if the socket option was not set, the socket exceeds its optmem
> >limit or the user exceeds its ulimit on locked pages."
> 
> You are correct, I unfortunately missed this part in the docs :(
> 
> > The limit on locked pages is something that looks very likely to be
> > exceeded unless you happen to be running a QEMU config that already
> > implies locked memory (eg PCI assignment)
> 
> Do you mean the limit an user has on locking memory?
> 
> If so, that makes sense. I remember I needed to set the upper limit of locked
> memory for the user before using it, or adding a capability to qemu before.

So I'm a bit confused on why MSG_ZEROCOPY requires checking RLIMIT_MEMLOCK.

The thing is IIUC that's accounting for pinned pages only with either mlock()
(FOLL_MLOCK) or vfio (FOLL_PIN).

I don't really think MSG_ZEROCOPY is doing that at all...  I'm looking at
__zerocopy_sg_from_iter() -> iov_iter_get_pages().

Say, I think there're plenty of iov_iter_get_pages() callers and normal GUPs, I
think most of them do no need such accountings.

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Dr. David Alan Gilbert
* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Thu, Sep 02, 2021 at 05:52:15AM -0300, Leonardo Bras Soares Passos wrote:
> > On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > Hello Daniel, thanks for the feedback !
> > > >
> > > > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé 
> > > >  wrote:
> > > > >
> > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > > thread.
> > > > > >
> > > > > > Change the send_write() interface of multifd, allowing it to pass 
> > > > > > down
> > > > > > flags for qio_channel_write*().
> > > > > >
> > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > > the
> > > > > > other data being sent at the default copying approach.
> > > > > >
> > > > > > Signed-off-by: Leonardo Bras 
> > > > > > ---
> > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > >  migration/multifd.c  | 9 ++---
> > > > > >  migration/multifd.h  | 3 ++-
> > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > >
> > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > > >  }
> > > > > >
> > > > > >  if (used) {
> > > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > _err);
> > > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > MSG_ZEROCOPY,
> > > > > > +  
> > > > > > _err);
> > > > >
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > >
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > > >if the socket option was not set, the socket exceeds its optmem
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > >
> > > > You are correct, I unfortunately missed this part in the docs :(
> > > >
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > >
> > > > Do you mean the limit an user has on locking memory?
> > >
> > > Yes, by default limit QEMU sees will be something very small.
> > >
> > > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > > locked
> > > > memory for the user before using it, or adding a capability to qemu 
> > > > before.
> > > >
> > > > Maybe an option would be trying to mlock all guest memory before setting
> > > > zerocopy=on in qemu code. If it fails, we can print an error message 
> > > > and fall
> > > > back to not using zerocopy (following the idea of a new 
> > > > io_async_writev()
> > > > I told you in the previous mail).
> > >
> > > Currently ability to lock memory is something that has to be configured
> > > when QEMU starts, and it requires libvirt to grant suitable permissions
> > > to QEMU. Memory locking is generally undesirable because it prevents
> > > memory overcommit. Or rather if you are allowing memory overcommit, then
> > > allowing memory locking is a way to kill your entire host.
> > 
> > You mean it's gonna consume too much memory, or something else?
> 
> Essentially yes. 
> 
> > > I don't think we can unconditionally grant ability to lock arbitrary
> > > guest RAM at startup, just to satisfy a possible desire to use zerocopy
> > > migration later. Granting it at runtime feels questionable as you now
> > > need to track and predict how much locked memory you've allowed, and
> > > also have possible problems with revokation.
> > 
> > (I am really new to this, so please forgive me if I am asking dumb or
> > overly basic questions)
> > 
> > What does revokation means in this context?
> > You give the process hability to lock n MB of memory, and then you take it?
> > Why would that happen? Is Locked memory a limited resource?
> 
> Consider a VM host with 64 GB of RAM and 64 GB of swap, and an
> overcommit ratio of 1.5. ie we'll run VMs with 64*1.5 GB of RAM
> total.
> 
> So we can run 3 VMs each with 32 GB of RAM, giving 96 GB of usage
> which exceeds physical RAM. Most of the time this may well be fine
> as the VMs don't concurrently need their full RAM allocation, and
> worst case they'll get pushed to swap as the kernel re-shares
> memory in respose to load. So perhaps each VM only needs 20 GB
> resident at any time, but over time one VM can burst upto 32 GB
> and then 12 GB of it get swapped out later when inactive.
> 
> But now consider if we allowed 2 of the VMs to lock memory for
> purposes of migration. Those 2 VMs 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-07 Thread Dr. David Alan Gilbert
* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Wed, Sep 01, 2021 at 11:35:33AM -0400, Peter Xu wrote:
> > On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > > > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > > thread.
> > > > > > 
> > > > > > Change the send_write() interface of multifd, allowing it to pass 
> > > > > > down
> > > > > > flags for qio_channel_write*().
> > > > > > 
> > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > > the
> > > > > > other data being sent at the default copying approach.
> > > > > > 
> > > > > > Signed-off-by: Leonardo Bras 
> > > > > > ---
> > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > >  migration/multifd.c  | 9 ++---
> > > > > >  migration/multifd.h  | 3 ++-
> > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > > 
> > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > > >  }
> > > > > >  
> > > > > >  if (used) {
> > > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > _err);
> > > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > MSG_ZEROCOPY,
> > > > > > +  
> > > > > > _err);
> > > > > 
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > > 
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > 
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> > > > >if the socket option was not set, the socket exceeds its optmem 
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > > > 
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > > 
> > > > Yes it would be great to be a migration capability in parallel to 
> > > > multifd. At
> > > > initial phase if it's easy to be implemented on multi-fd only, we can 
> > > > add a
> > > > dependency between the caps.  In the future we can remove that 
> > > > dependency when
> > > > the code is ready to go without multifd.  Thanks,
> > > 
> > > Also, I'm wondering how zerocopy support interacts with kernel support
> > > for kTLS and multipath-TCP, both of which we want to be able to use
> > > with migration.
> > 
> > Copying Jason Wang for net implications between these features on kernel 
> > side
> > and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).
> > 
> > From the safe side we may want to only enable one of them until we prove
> > they'll work together I guess..
> 
> MPTCP is good when we're network limited for migration
> 
> KTLS will be good when we're CPU limited on AES for migration,
> which is essentially always when TLS is used.
> 
> ZEROCOPY will be good when we're CPU limited for data copy
> on migration, or to reduce the impact on other concurrent
> VMs on the same CPUs.
> 
> Ultimately we woudld benefit from all of them at the same
> time, if it were technically possible todo.

I think last time I spoke to Paolo Abeni there were some interactions
between them; I can't remember what though (I think mptcp and ktls
didn't play at the time).

Dave

> > Not a immediate concern as I don't really think any of them is really
> > explicitly supported in qemu.
> 
> QEMU has mptcp support already:
> 
>   commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
>   Author: Dr. David Alan Gilbert 
>   Date:   Wed Apr 21 12:28:34 2021 +0100
> 
> sockets: Support multipath TCP
> 
> Multipath TCP allows combining multiple interfaces/routes into a single
> socket, with very little work for the user/admin.
> 
> It's enabled by 'mptcp' on most socket addresses:
> 
>./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp
> 
> > KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> > ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> > gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> > least we need some knob to detect whether kTLS is enabled in gnutls.
> 
> It isn't possible for gnutls to transparently enable KTLS, because
> GNUTLS doesn't get to see the actual socket directly - it'll need
> some work in QEMU to enable it.  We know MPTCP and KTLS are currently
> mutually exclusive as they both use the same kernel network hooks
> framework.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
On Thu, Sep 2, 2021 at 6:59 AM Daniel P. Berrangé  wrote:
>
> On Thu, Sep 02, 2021 at 06:49:06AM -0300, Leonardo Bras Soares Passos wrote:
> > On Thu, Sep 2, 2021 at 6:20 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Thu, Sep 02, 2021 at 05:52:15AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > > > wrote:
> > > > > > Hello Daniel, thanks for the feedback !
> > > > > >
> > > > > > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > > > Call qio_channel_set_zerocopy(true) in the start of every 
> > > > > > > > multifd thread.
> > > > > > > >
> > > > > > > > Change the send_write() interface of multifd, allowing it to 
> > > > > > > > pass down
> > > > > > > > flags for qio_channel_write*().
> > > > > > > >
> > > > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while 
> > > > > > > > keeping the
> > > > > > > > other data being sent at the default copying approach.
> > > > > > > >
> > > > > > > > Signed-off-by: Leonardo Bras 
> > > > > > > > ---
> > > > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > > > >  migration/multifd.c  | 9 ++---
> > > > > > > >  migration/multifd.h  | 3 ++-
> > > > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > > > >
> > > > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void 
> > > > > > > > *opaque)
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  if (used) {
> > > > > > > > -ret = multifd_send_state->ops->send_write(p, 
> > > > > > > > used, _err);
> > > > > > > > +ret = multifd_send_state->ops->send_write(p, 
> > > > > > > > used, MSG_ZEROCOPY,
> > > > > > > > +  
> > > > > > > > _err);
> > > > > > >
> > > > > > > I don't think it is valid to unconditionally enable this feature 
> > > > > > > due to the
> > > > > > > resource usage implications
> > > > > > >
> > > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > > >
> > > > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This 
> > > > > > > happens
> > > > > > >if the socket option was not set, the socket exceeds its optmem
> > > > > > >limit or the user exceeds its ulimit on locked pages."
> > > > > >
> > > > > > You are correct, I unfortunately missed this part in the docs :(
> > > > > >
> > > > > > > The limit on locked pages is something that looks very likely to 
> > > > > > > be
> > > > > > > exceeded unless you happen to be running a QEMU config that 
> > > > > > > already
> > > > > > > implies locked memory (eg PCI assignment)
> > > > > >
> > > > > > Do you mean the limit an user has on locking memory?
> > > > >
> > > > > Yes, by default limit QEMU sees will be something very small.
> > > > >
> > > > > > If so, that makes sense. I remember I needed to set the upper limit 
> > > > > > of locked
> > > > > > memory for the user before using it, or adding a capability to qemu 
> > > > > > before.
> > > > > >
> > > > > > Maybe an option would be trying to mlock all guest memory before 
> > > > > > setting
> > > > > > zerocopy=on in qemu code. If it fails, we can print an error 
> > > > > > message and fall
> > > > > > back to not using zerocopy (following the idea of a new 
> > > > > > io_async_writev()
> > > > > > I told you in the previous mail).
> > > > >
> > > > > Currently ability to lock memory is something that has to be 
> > > > > configured
> > > > > when QEMU starts, and it requires libvirt to grant suitable 
> > > > > permissions
> > > > > to QEMU. Memory locking is generally undesirable because it prevents
> > > > > memory overcommit. Or rather if you are allowing memory overcommit, 
> > > > > then
> > > > > allowing memory locking is a way to kill your entire host.
> > > >
> > > > You mean it's gonna consume too much memory, or something else?
> > >
> > > Essentially yes.
> >
> > Well, maybe we can check for available memory before doing that,
> > but maybe it's too much effort.
>
> From a mgmt app POV, we assume QEMU is untrustworthy, so the mgmt
> app needs to enforce policy based on the worst case that the
> VM configuration allows for.
>
> Checking current available memory is not viable, because even
> if you see 10 GB free, QEMU can't know if that free memory is
> there to satisfy other VMs's worst case needs, or its own. QEMU
> needs to be explicitly told whether its OK to use locked memory,
> and an enforcement policy applied to it.


Yeah, it makes sense to let the mgmt app deal with that and enable/disable
the MSG_ZEROCOPY on migration whenever it sees fit.


>
>
> > > Consider a VM host with 64 GB of RAM and 64 GB 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Daniel P . Berrangé
On Thu, Sep 02, 2021 at 06:49:06AM -0300, Leonardo Bras Soares Passos wrote:
> On Thu, Sep 2, 2021 at 6:20 AM Daniel P. Berrangé  wrote:
> >
> > On Thu, Sep 02, 2021 at 05:52:15AM -0300, Leonardo Bras Soares Passos wrote:
> > > On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > > wrote:
> > > > > Hello Daniel, thanks for the feedback !
> > > > >
> > > > > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > > > thread.
> > > > > > >
> > > > > > > Change the send_write() interface of multifd, allowing it to pass 
> > > > > > > down
> > > > > > > flags for qio_channel_write*().
> > > > > > >
> > > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while 
> > > > > > > keeping the
> > > > > > > other data being sent at the default copying approach.
> > > > > > >
> > > > > > > Signed-off-by: Leonardo Bras 
> > > > > > > ---
> > > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > > >  migration/multifd.c  | 9 ++---
> > > > > > >  migration/multifd.h  | 3 ++-
> > > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > > >
> > > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > > > >  }
> > > > > > >
> > > > > > >  if (used) {
> > > > > > > -ret = multifd_send_state->ops->send_write(p, 
> > > > > > > used, _err);
> > > > > > > +ret = multifd_send_state->ops->send_write(p, 
> > > > > > > used, MSG_ZEROCOPY,
> > > > > > > +  
> > > > > > > _err);
> > > > > >
> > > > > > I don't think it is valid to unconditionally enable this feature 
> > > > > > due to the
> > > > > > resource usage implications
> > > > > >
> > > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > > >
> > > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This 
> > > > > > happens
> > > > > >if the socket option was not set, the socket exceeds its optmem
> > > > > >limit or the user exceeds its ulimit on locked pages."
> > > > >
> > > > > You are correct, I unfortunately missed this part in the docs :(
> > > > >
> > > > > > The limit on locked pages is something that looks very likely to be
> > > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > > implies locked memory (eg PCI assignment)
> > > > >
> > > > > Do you mean the limit an user has on locking memory?
> > > >
> > > > Yes, by default limit QEMU sees will be something very small.
> > > >
> > > > > If so, that makes sense. I remember I needed to set the upper limit 
> > > > > of locked
> > > > > memory for the user before using it, or adding a capability to qemu 
> > > > > before.
> > > > >
> > > > > Maybe an option would be trying to mlock all guest memory before 
> > > > > setting
> > > > > zerocopy=on in qemu code. If it fails, we can print an error message 
> > > > > and fall
> > > > > back to not using zerocopy (following the idea of a new 
> > > > > io_async_writev()
> > > > > I told you in the previous mail).
> > > >
> > > > Currently ability to lock memory is something that has to be configured
> > > > when QEMU starts, and it requires libvirt to grant suitable permissions
> > > > to QEMU. Memory locking is generally undesirable because it prevents
> > > > memory overcommit. Or rather if you are allowing memory overcommit, then
> > > > allowing memory locking is a way to kill your entire host.
> > >
> > > You mean it's gonna consume too much memory, or something else?
> >
> > Essentially yes.
> 
> Well, maybe we can check for available memory before doing that,
> but maybe it's too much effort.

>From a mgmt app POV, we assume QEMU is untrustworthy, so the mgmt
app needs to enforce policy based on the worst case that the
VM configuration allows for.

Checking current available memory is not viable, because even
if you see 10 GB free, QEMU can't know if that free memory is
there to satisfy other VMs's worst case needs, or its own. QEMU
needs to be explicitly told whether its OK to use locked memory,
and an enforcement policy applied to it.


> > Consider a VM host with 64 GB of RAM and 64 GB of swap, and an
> > overcommit ratio of 1.5. ie we'll run VMs with 64*1.5 GB of RAM
> > total.
> >
> > So we can run 3 VMs each with 32 GB of RAM, giving 96 GB of usage
> > which exceeds physical RAM. Most of the time this may well be fine
> > as the VMs don't concurrently need their full RAM allocation, and
> > worst case they'll get pushed to swap as the kernel re-shares
> > memory in respose to load. So perhaps each VM only needs 20 GB
> > resident at any time, but over time one 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
On Thu, Sep 2, 2021 at 6:20 AM Daniel P. Berrangé  wrote:
>
> On Thu, Sep 02, 2021 at 05:52:15AM -0300, Leonardo Bras Soares Passos wrote:
> > On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos 
> > > wrote:
> > > > Hello Daniel, thanks for the feedback !
> > > >
> > > > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé 
> > > >  wrote:
> > > > >
> > > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > > thread.
> > > > > >
> > > > > > Change the send_write() interface of multifd, allowing it to pass 
> > > > > > down
> > > > > > flags for qio_channel_write*().
> > > > > >
> > > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > > the
> > > > > > other data being sent at the default copying approach.
> > > > > >
> > > > > > Signed-off-by: Leonardo Bras 
> > > > > > ---
> > > > > >  migration/multifd-zlib.c | 7 ---
> > > > > >  migration/multifd-zstd.c | 7 ---
> > > > > >  migration/multifd.c  | 9 ++---
> > > > > >  migration/multifd.h  | 3 ++-
> > > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > >
> > > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > > >  }
> > > > > >
> > > > > >  if (used) {
> > > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > _err);
> > > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > > MSG_ZEROCOPY,
> > > > > > +  
> > > > > > _err);
> > > > >
> > > > > I don't think it is valid to unconditionally enable this feature due 
> > > > > to the
> > > > > resource usage implications
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > >
> > > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > > >if the socket option was not set, the socket exceeds its optmem
> > > > >limit or the user exceeds its ulimit on locked pages."
> > > >
> > > > You are correct, I unfortunately missed this part in the docs :(
> > > >
> > > > > The limit on locked pages is something that looks very likely to be
> > > > > exceeded unless you happen to be running a QEMU config that already
> > > > > implies locked memory (eg PCI assignment)
> > > >
> > > > Do you mean the limit an user has on locking memory?
> > >
> > > Yes, by default limit QEMU sees will be something very small.
> > >
> > > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > > locked
> > > > memory for the user before using it, or adding a capability to qemu 
> > > > before.
> > > >
> > > > Maybe an option would be trying to mlock all guest memory before setting
> > > > zerocopy=on in qemu code. If it fails, we can print an error message 
> > > > and fall
> > > > back to not using zerocopy (following the idea of a new 
> > > > io_async_writev()
> > > > I told you in the previous mail).
> > >
> > > Currently ability to lock memory is something that has to be configured
> > > when QEMU starts, and it requires libvirt to grant suitable permissions
> > > to QEMU. Memory locking is generally undesirable because it prevents
> > > memory overcommit. Or rather if you are allowing memory overcommit, then
> > > allowing memory locking is a way to kill your entire host.
> >
> > You mean it's gonna consume too much memory, or something else?
>
> Essentially yes.

Well, maybe we can check for available memory before doing that,
but maybe it's too much effort.

>
> > > I don't think we can unconditionally grant ability to lock arbitrary
> > > guest RAM at startup, just to satisfy a possible desire to use zerocopy
> > > migration later. Granting it at runtime feels questionable as you now
> > > need to track and predict how much locked memory you've allowed, and
> > > also have possible problems with revokation.
> >
> > (I am really new to this, so please forgive me if I am asking dumb or
> > overly basic questions)
> >
> > What does revokation means in this context?
> > You give the process hability to lock n MB of memory, and then you take it?
> > Why would that happen? Is Locked memory a limited resource?
>
> Consider a VM host with 64 GB of RAM and 64 GB of swap, and an
> overcommit ratio of 1.5. ie we'll run VMs with 64*1.5 GB of RAM
> total.
>
> So we can run 3 VMs each with 32 GB of RAM, giving 96 GB of usage
> which exceeds physical RAM. Most of the time this may well be fine
> as the VMs don't concurrently need their full RAM allocation, and
> worst case they'll get pushed to swap as the kernel re-shares
> memory in respose to load. So perhaps each VM only needs 20 GB
> resident at any time, but over time one VM can burst upto 32 GB
> and then 12 GB of it get swapped out later when inactive.
>
> 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Daniel P . Berrangé
On Thu, Sep 02, 2021 at 05:52:15AM -0300, Leonardo Bras Soares Passos wrote:
> On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  wrote:
> >
> > On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > > Hello Daniel, thanks for the feedback !
> > >
> > > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > thread.
> > > > >
> > > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > > flags for qio_channel_write*().
> > > > >
> > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > the
> > > > > other data being sent at the default copying approach.
> > > > >
> > > > > Signed-off-by: Leonardo Bras 
> > > > > ---
> > > > >  migration/multifd-zlib.c | 7 ---
> > > > >  migration/multifd-zstd.c | 7 ---
> > > > >  migration/multifd.c  | 9 ++---
> > > > >  migration/multifd.h  | 3 ++-
> > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > >
> > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > >  }
> > > > >
> > > > >  if (used) {
> > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > _err);
> > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > MSG_ZEROCOPY,
> > > > > +  
> > > > > _err);
> > > >
> > > > I don't think it is valid to unconditionally enable this feature due to 
> > > > the
> > > > resource usage implications
> > > >
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > >
> > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > > >if the socket option was not set, the socket exceeds its optmem
> > > >limit or the user exceeds its ulimit on locked pages."
> > >
> > > You are correct, I unfortunately missed this part in the docs :(
> > >
> > > > The limit on locked pages is something that looks very likely to be
> > > > exceeded unless you happen to be running a QEMU config that already
> > > > implies locked memory (eg PCI assignment)
> > >
> > > Do you mean the limit an user has on locking memory?
> >
> > Yes, by default limit QEMU sees will be something very small.
> >
> > > If so, that makes sense. I remember I needed to set the upper limit of 
> > > locked
> > > memory for the user before using it, or adding a capability to qemu 
> > > before.
> > >
> > > Maybe an option would be trying to mlock all guest memory before setting
> > > zerocopy=on in qemu code. If it fails, we can print an error message and 
> > > fall
> > > back to not using zerocopy (following the idea of a new io_async_writev()
> > > I told you in the previous mail).
> >
> > Currently ability to lock memory is something that has to be configured
> > when QEMU starts, and it requires libvirt to grant suitable permissions
> > to QEMU. Memory locking is generally undesirable because it prevents
> > memory overcommit. Or rather if you are allowing memory overcommit, then
> > allowing memory locking is a way to kill your entire host.
> 
> You mean it's gonna consume too much memory, or something else?

Essentially yes. 

> > I don't think we can unconditionally grant ability to lock arbitrary
> > guest RAM at startup, just to satisfy a possible desire to use zerocopy
> > migration later. Granting it at runtime feels questionable as you now
> > need to track and predict how much locked memory you've allowed, and
> > also have possible problems with revokation.
> 
> (I am really new to this, so please forgive me if I am asking dumb or
> overly basic questions)
> 
> What does revokation means in this context?
> You give the process hability to lock n MB of memory, and then you take it?
> Why would that happen? Is Locked memory a limited resource?

Consider a VM host with 64 GB of RAM and 64 GB of swap, and an
overcommit ratio of 1.5. ie we'll run VMs with 64*1.5 GB of RAM
total.

So we can run 3 VMs each with 32 GB of RAM, giving 96 GB of usage
which exceeds physical RAM. Most of the time this may well be fine
as the VMs don't concurrently need their full RAM allocation, and
worst case they'll get pushed to swap as the kernel re-shares
memory in respose to load. So perhaps each VM only needs 20 GB
resident at any time, but over time one VM can burst upto 32 GB
and then 12 GB of it get swapped out later when inactive.

But now consider if we allowed 2 of the VMs to lock memory for
purposes of migration. Those 2 VMs can now pin 64 GB of memory
in the worst case, leaving no free memory for the 3rd VM or
for the OS. This will likely take down the entire host, regardless
of swap availability.

IOW, if you are overcomitting RAM you have to be extremely
careful about allowing any VM to lock memory. If you do decide

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
On Thu, Sep 2, 2021 at 5:21 AM Daniel P. Berrangé  wrote:
>
> On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel, thanks for the feedback !
> >
> > On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > thread.
> > > >
> > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > flags for qio_channel_write*().
> > > >
> > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > > other data being sent at the default copying approach.
> > > >
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  migration/multifd-zlib.c | 7 ---
> > > >  migration/multifd-zstd.c | 7 ---
> > > >  migration/multifd.c  | 9 ++---
> > > >  migration/multifd.h  | 3 ++-
> > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > >
> > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > >  }
> > > >
> > > >  if (used) {
> > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > _err);
> > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > MSG_ZEROCOPY,
> > > > +  _err);
> > >
> > > I don't think it is valid to unconditionally enable this feature due to 
> > > the
> > > resource usage implications
> > >
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > >
> > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> > >if the socket option was not set, the socket exceeds its optmem
> > >limit or the user exceeds its ulimit on locked pages."
> >
> > You are correct, I unfortunately missed this part in the docs :(
> >
> > > The limit on locked pages is something that looks very likely to be
> > > exceeded unless you happen to be running a QEMU config that already
> > > implies locked memory (eg PCI assignment)
> >
> > Do you mean the limit an user has on locking memory?
>
> Yes, by default limit QEMU sees will be something very small.
>
> > If so, that makes sense. I remember I needed to set the upper limit of 
> > locked
> > memory for the user before using it, or adding a capability to qemu before.
> >
> > Maybe an option would be trying to mlock all guest memory before setting
> > zerocopy=on in qemu code. If it fails, we can print an error message and 
> > fall
> > back to not using zerocopy (following the idea of a new io_async_writev()
> > I told you in the previous mail).
>
> Currently ability to lock memory is something that has to be configured
> when QEMU starts, and it requires libvirt to grant suitable permissions
> to QEMU. Memory locking is generally undesirable because it prevents
> memory overcommit. Or rather if you are allowing memory overcommit, then
> allowing memory locking is a way to kill your entire host.

You mean it's gonna consume too much memory, or something else?

>
> I don't think we can unconditionally grant ability to lock arbitrary
> guest RAM at startup, just to satisfy a possible desire to use zerocopy
> migration later. Granting it at runtime feels questionable as you now
> need to track and predict how much locked memory you've allowed, and
> also have possible problems with revokation.

(I am really new to this, so please forgive me if I am asking dumb or
overly basic questions)

What does revokation means in this context?
You give the process hability to lock n MB of memory, and then you take it?
Why would that happen? Is Locked memory a limited resource?

>
> Possibly you could unconditionally grant ability to lock a small amount
> of guest RAM at startup, but how small can it be, while still making a
> useful difference to migration. It would imply we also need to be very
> careful with migration to avoid having too large an amount of outstanding
> zerocopy requests to exceed the limit.

Yeah, having to decide on a value that would be ok to lock is very
complex, given we can migrate with multifd, which can make this value grow
a lot. Except if we only allow a few of those fds to really use zerocopy.

>
> IOW, the only clear place in which we can use zerocopy, is where we are
> already forced to accept the penalty of locked memory at startup. eg when
> the guest is using huge pages and no overcommit, or possibly when the guest
> is using PCI device assignment,

It would be something already, given that those scenarios are the ones with
the largest number of pages to migrate. But I understand we could give it a try
on other scenarios.

> though in the latter I can't remember if
> we allow entire of guest RAM to be locked or not.

If I recall correctly on a previous discussion, this was the case at least for
PCI passthrough.

>
> Overall the memory locking needs look like a significant constraint that
> 

Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Daniel P . Berrangé
On Thu, Sep 02, 2021 at 04:22:55AM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel, thanks for the feedback !
> 
> On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé  
> wrote:
> >
> > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> > >
> > > Change the send_write() interface of multifd, allowing it to pass down
> > > flags for qio_channel_write*().
> > >
> > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > other data being sent at the default copying approach.
> > >
> > > Signed-off-by: Leonardo Bras 
> > > ---
> > >  migration/multifd-zlib.c | 7 ---
> > >  migration/multifd-zstd.c | 7 ---
> > >  migration/multifd.c  | 9 ++---
> > >  migration/multifd.h  | 3 ++-
> > >  4 files changed, 16 insertions(+), 10 deletions(-)
> >
> > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > >  }
> > >
> > >  if (used) {
> > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > _err);
> > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > MSG_ZEROCOPY,
> > > +  _err);
> >
> > I don't think it is valid to unconditionally enable this feature due to the
> > resource usage implications
> >
> > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> >
> >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
> >if the socket option was not set, the socket exceeds its optmem
> >limit or the user exceeds its ulimit on locked pages."
> 
> You are correct, I unfortunately missed this part in the docs :(
> 
> > The limit on locked pages is something that looks very likely to be
> > exceeded unless you happen to be running a QEMU config that already
> > implies locked memory (eg PCI assignment)
> 
> Do you mean the limit an user has on locking memory?

Yes, by default limit QEMU sees will be something very small.

> If so, that makes sense. I remember I needed to set the upper limit of locked
> memory for the user before using it, or adding a capability to qemu before.
> 
> Maybe an option would be trying to mlock all guest memory before setting
> zerocopy=on in qemu code. If it fails, we can print an error message and fall
> back to not using zerocopy (following the idea of a new io_async_writev()
> I told you in the previous mail).

Currently ability to lock memory is something that has to be configured
when QEMU starts, and it requires libvirt to grant suitable permissions
to QEMU. Memory locking is generally undesirable because it prevents
memory overcommit. Or rather if you are allowing memory overcommit, then
allowing memory locking is a way to kill your entire host.

I don't think we can unconditionally grant ability to lock arbitrary
guest RAM at startup, just to satisfy a possible desire to use zerocopy
migration later. Granting it at runtime feels questionable as you now
need to track and predict how much locked memory you've allowed, and
also have possible problems with revokation.

Possibly you could unconditionally grant ability to lock a small amount
of guest RAM at startup, but how small can it be, while still making a
useful difference to migration. It would imply we also need to be very
careful with migration to avoid having too large an amount of outstanding
zerocopy requests to exceed the limit.

IOW, the only clear place in which we can use zerocopy, is where we are
already forced to accept the penalty of locked memory at startup. eg when
the guest is using huge pages and no overcommit, or possibly when the guest
is using PCI device assignment, though in the latter I can't remember if
we allow entire of guest RAM to be locked or not.

Overall the memory locking needs look like a significant constraint that
will affect ability to use this feature.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
Thanks for contributing Jason!

On Thu, Sep 2, 2021 at 4:23 AM Jason Wang  wrote:
>
>
> 在 2021/9/1 下午11:35, Peter Xu 写道:
> > On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> >> On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> >>> On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
>  On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > thread.
> >
> > Change the send_write() interface of multifd, allowing it to pass down
> > flags for qio_channel_write*().
> >
> > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > other data being sent at the default copying approach.
> >
> > Signed-off-by: Leonardo Bras 
> > ---
> >   migration/multifd-zlib.c | 7 ---
> >   migration/multifd-zstd.c | 7 ---
> >   migration/multifd.c  | 9 ++---
> >   migration/multifd.h  | 3 ++-
> >   4 files changed, 16 insertions(+), 10 deletions(-)
> > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> >   }
> >
> >   if (used) {
> > -ret = multifd_send_state->ops->send_write(p, used, 
> > _err);
> > +ret = multifd_send_state->ops->send_write(p, used, 
> > MSG_ZEROCOPY,
> > +  _err);
>  I don't think it is valid to unconditionally enable this feature due to 
>  the
>  resource usage implications
> 
>  https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> 
> "A zerocopy failure will return -1 with errno ENOBUFS. This happens
>  if the socket option was not set, the socket exceeds its optmem
>  limit or the user exceeds its ulimit on locked pages."
> 
>  The limit on locked pages is something that looks very likely to be
>  exceeded unless you happen to be running a QEMU config that already
>  implies locked memory (eg PCI assignment)
> >>> Yes it would be great to be a migration capability in parallel to 
> >>> multifd. At
> >>> initial phase if it's easy to be implemented on multi-fd only, we can add 
> >>> a
> >>> dependency between the caps.  In the future we can remove that dependency 
> >>> when
> >>> the code is ready to go without multifd.  Thanks,
> >> Also, I'm wondering how zerocopy support interacts with kernel support
> >> for kTLS and multipath-TCP, both of which we want to be able to use
> >> with migration.
> > Copying Jason Wang for net implications between these features on kernel 
> > side
>
>
> Note that the MSG_ZEROCOPY is contributed by Google :)
>
>
> > and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).
>
>
> I think they can. Anyway kernel can choose to do datacopy when necessary.
>
> Note that the "zerocopy" is probably not correct here. What's better is
> "Enable MSG_ZEROCOPY" since:
>
> 1) kernel supports various kinds of zerocopy, for TX, it has supported
> sendfile() for many years.
> 2) MSG_ZEROCOPY is only used for TX but not RX
> 3) TCP rx zerocopy is only supported via mmap() which requires some
> extra configurations e.g 4K MTU, driver support for header split etc.

RX would be my next challenge :)

>
> [1] https://www.youtube.com/watch?v=_ZfiQGWFvg0

Thank you for sharing!

Best regards,
Leonardo




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
A few more comments on this one:

On Wed, Sep 1, 2021 at 12:44 PM Daniel P. Berrangé  wrote:
>
> > From the safe side we may want to only enable one of them until we prove
> > they'll work together I guess..
>
> MPTCP is good when we're network limited for migration
>
> KTLS will be good when we're CPU limited on AES for migration,
> which is essentially always when TLS is used.
>
> ZEROCOPY will be good when we're CPU limited for data copy
> on migration, or to reduce the impact on other concurrent
> VMs on the same CPUs.
>
> Ultimately we woudld benefit from all of them at the same
> time, if it were technically possible todo.

Maybe I misunderstood something, but IIRC KTLS kind of does some
'zerocopy'.

I mean, on an usual setup, we would have a buffer A that would contain
the encrypted data, and a buffer B that would receive the encrypted data,
and a buffer C, in kernel, that would receive a copy of buffer B.

On KTLS, the buffer A would be passed to kernel and be encrypted directly
in buffer C, avoiding one copy.
Oh, it's possible Buffer A would be copied to buffer B', which is
inside the kernel,
and then encrypted to buffer C, but I am not sure if this would make sense.

Anyway, other than B' case, it would make no sense having MSG_ZEROCOPY
in QIOChannel_TLS, so that's something that I need to change in this patchset.

>
> > Not a immediate concern as I don't really think any of them is really
> > explicitly supported in qemu.
>
> QEMU has mptcp support already:
>
>   commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
>   Author: Dr. David Alan Gilbert 
>   Date:   Wed Apr 21 12:28:34 2021 +0100
>
> sockets: Support multipath TCP
>
> Multipath TCP allows combining multiple interfaces/routes into a single
> socket, with very little work for the user/admin.
>
> It's enabled by 'mptcp' on most socket addresses:
>
>./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp
>
> > KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> > ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> > gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> > least we need some knob to detect whether kTLS is enabled in gnutls.
>
> It isn't possible for gnutls to transparently enable KTLS, because
> GNUTLS doesn't get to see the actual socket directly

Yes, IIRC it uses gnuTLS with callbacks instead.

> - it'll need
> some work in QEMU to enable it.

Maybe it's overkill, but what if we get to implement using KTLS
directly in QEMU,
and fall back to gnuTLS when it's not available?




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
Hello Peter, thank you for this feedback!

On Tue, Aug 31, 2021 at 5:29 PM Peter Xu  wrote:
> Yes it would be great to be a migration capability in parallel to multifd. At
> initial phase if it's easy to be implemented on multi-fd only, we can add a
> dependency between the caps.  In the future we can remove that dependency when
> the code is ready to go without multifd.  Thanks,

I thought the idea was to use MSG_ZEROCOPY whenever possible, and otherwise
fall back to the default copying mechanism.

On replies to 2/3 I mentioned a new method to QIOChannel interface,
that would fall
back to copying, and that may be a way to not have to use a capability.

Best regards,
Leonardo




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Jason Wang



在 2021/9/1 下午11:35, Peter Xu 写道:

On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:

On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:

On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:

On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:

Call qio_channel_set_zerocopy(true) in the start of every multifd thread.

Change the send_write() interface of multifd, allowing it to pass down
flags for qio_channel_write*().

Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
other data being sent at the default copying approach.

Signed-off-by: Leonardo Bras 
---
  migration/multifd-zlib.c | 7 ---
  migration/multifd-zstd.c | 7 ---
  migration/multifd.c  | 9 ++---
  migration/multifd.h  | 3 ++-
  4 files changed, 16 insertions(+), 10 deletions(-)
@@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
  }
  
  if (used) {

-ret = multifd_send_state->ops->send_write(p, used, _err);
+ret = multifd_send_state->ops->send_write(p, used, 
MSG_ZEROCOPY,
+  _err);

I don't think it is valid to unconditionally enable this feature due to the
resource usage implications

https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html

   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
if the socket option was not set, the socket exceeds its optmem
limit or the user exceeds its ulimit on locked pages."

The limit on locked pages is something that looks very likely to be
exceeded unless you happen to be running a QEMU config that already
implies locked memory (eg PCI assignment)

Yes it would be great to be a migration capability in parallel to multifd. At
initial phase if it's easy to be implemented on multi-fd only, we can add a
dependency between the caps.  In the future we can remove that dependency when
the code is ready to go without multifd.  Thanks,

Also, I'm wondering how zerocopy support interacts with kernel support
for kTLS and multipath-TCP, both of which we want to be able to use
with migration.

Copying Jason Wang for net implications between these features on kernel side



Note that the MSG_ZEROCOPY is contributed by Google :)



and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).



I think they can. Anyway kernel can choose to do datacopy when necessary.

Note that the "zerocopy" is probably not correct here. What's better is 
"Enable MSG_ZEROCOPY" since:


1) kernel supports various kinds of zerocopy, for TX, it has supported 
sendfile() for many years.

2) MSG_ZEROCOPY is only used for TX but not RX
3) TCP rx zerocopy is only supported via mmap() which requires some 
extra configurations e.g 4K MTU, driver support for header split etc.


[1] https://www.youtube.com/watch?v=_ZfiQGWFvg0

Thanks




 From the safe side we may want to only enable one of them until we prove
they'll work together I guess..

Not a immediate concern as I don't really think any of them is really
explicitly supported in qemu.

KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
least we need some knob to detect whether kTLS is enabled in gnutls.






Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-02 Thread Leonardo Bras Soares Passos
Hello Daniel, thanks for the feedback !

On Tue, Aug 31, 2021 at 10:17 AM Daniel P. Berrangé  wrote:
>
> On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> >
> > Change the send_write() interface of multifd, allowing it to pass down
> > flags for qio_channel_write*().
> >
> > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > other data being sent at the default copying approach.
> >
> > Signed-off-by: Leonardo Bras 
> > ---
> >  migration/multifd-zlib.c | 7 ---
> >  migration/multifd-zstd.c | 7 ---
> >  migration/multifd.c  | 9 ++---
> >  migration/multifd.h  | 3 ++-
> >  4 files changed, 16 insertions(+), 10 deletions(-)
>
> > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> >  }
> >
> >  if (used) {
> > -ret = multifd_send_state->ops->send_write(p, used, 
> > _err);
> > +ret = multifd_send_state->ops->send_write(p, used, 
> > MSG_ZEROCOPY,
> > +  _err);
>
> I don't think it is valid to unconditionally enable this feature due to the
> resource usage implications
>
> https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
>
>   "A zerocopy failure will return -1 with errno ENOBUFS. This happens
>if the socket option was not set, the socket exceeds its optmem
>limit or the user exceeds its ulimit on locked pages."

You are correct, I unfortunately missed this part in the docs :(

> The limit on locked pages is something that looks very likely to be
> exceeded unless you happen to be running a QEMU config that already
> implies locked memory (eg PCI assignment)

Do you mean the limit an user has on locking memory?

If so, that makes sense. I remember I needed to set the upper limit of locked
memory for the user before using it, or adding a capability to qemu before.

Maybe an option would be trying to mlock all guest memory before setting
zerocopy=on in qemu code. If it fails, we can print an error message and fall
back to not using zerocopy (following the idea of a new io_async_writev()
I told you in the previous mail).


>
>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Best regards,
Leonardo




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Peter Xu
On Wed, Sep 01, 2021 at 04:44:30PM +0100, Daniel P. Berrangé wrote:
> QEMU has mptcp support already:
> 
>   commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
>   Author: Dr. David Alan Gilbert 
>   Date:   Wed Apr 21 12:28:34 2021 +0100
> 
> sockets: Support multipath TCP
> 
> Multipath TCP allows combining multiple interfaces/routes into a single
> socket, with very little work for the user/admin.
> 
> It's enabled by 'mptcp' on most socket addresses:
> 
>./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp

Oops, I totally forgot about that, sorry!

> 
> > KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> > ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> > gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> > least we need some knob to detect whether kTLS is enabled in gnutls.
> 
> It isn't possible for gnutls to transparently enable KTLS, because
> GNUTLS doesn't get to see the actual socket directly - it'll need
> some work in QEMU to enable it.  We know MPTCP and KTLS are currently
> mutually exclusive as they both use the same kernel network hooks
> framework.

Then we may need to at least figure out whether zerocopy needs to mask out
mptcp.

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Daniel P . Berrangé
On Wed, Sep 01, 2021 at 11:35:33AM -0400, Peter Xu wrote:
> On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > > thread.
> > > > > 
> > > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > > flags for qio_channel_write*().
> > > > > 
> > > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping 
> > > > > the
> > > > > other data being sent at the default copying approach.
> > > > > 
> > > > > Signed-off-by: Leonardo Bras 
> > > > > ---
> > > > >  migration/multifd-zlib.c | 7 ---
> > > > >  migration/multifd-zstd.c | 7 ---
> > > > >  migration/multifd.c  | 9 ++---
> > > > >  migration/multifd.h  | 3 ++-
> > > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > > 
> > > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > > >  }
> > > > >  
> > > > >  if (used) {
> > > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > > _err);
> > > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > > MSG_ZEROCOPY,
> > > > > +  
> > > > > _err);
> > > > 
> > > > I don't think it is valid to unconditionally enable this feature due to 
> > > > the
> > > > resource usage implications
> > > > 
> > > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > > 
> > > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> > > >if the socket option was not set, the socket exceeds its optmem 
> > > >limit or the user exceeds its ulimit on locked pages."
> > > > 
> > > > The limit on locked pages is something that looks very likely to be
> > > > exceeded unless you happen to be running a QEMU config that already
> > > > implies locked memory (eg PCI assignment)
> > > 
> > > Yes it would be great to be a migration capability in parallel to 
> > > multifd. At
> > > initial phase if it's easy to be implemented on multi-fd only, we can add 
> > > a
> > > dependency between the caps.  In the future we can remove that dependency 
> > > when
> > > the code is ready to go without multifd.  Thanks,
> > 
> > Also, I'm wondering how zerocopy support interacts with kernel support
> > for kTLS and multipath-TCP, both of which we want to be able to use
> > with migration.
> 
> Copying Jason Wang for net implications between these features on kernel side
> and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).
> 
> From the safe side we may want to only enable one of them until we prove
> they'll work together I guess..

MPTCP is good when we're network limited for migration

KTLS will be good when we're CPU limited on AES for migration,
which is essentially always when TLS is used.

ZEROCOPY will be good when we're CPU limited for data copy
on migration, or to reduce the impact on other concurrent
VMs on the same CPUs.

Ultimately we woudld benefit from all of them at the same
time, if it were technically possible todo.

> Not a immediate concern as I don't really think any of them is really
> explicitly supported in qemu.

QEMU has mptcp support already:

  commit 8bd1078aebcec5eac196a83ef1a7e74be0ba67b7
  Author: Dr. David Alan Gilbert 
  Date:   Wed Apr 21 12:28:34 2021 +0100

sockets: Support multipath TCP

Multipath TCP allows combining multiple interfaces/routes into a single
socket, with very little work for the user/admin.

It's enabled by 'mptcp' on most socket addresses:

   ./qemu-system-x86_64 -nographic -incoming tcp:0:,mptcp

> KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
> ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
> gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
> least we need some knob to detect whether kTLS is enabled in gnutls.

It isn't possible for gnutls to transparently enable KTLS, because
GNUTLS doesn't get to see the actual socket directly - it'll need
some work in QEMU to enable it.  We know MPTCP and KTLS are currently
mutually exclusive as they both use the same kernel network hooks
framework.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Peter Xu
On Wed, Sep 01, 2021 at 09:53:07AM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> > On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > > Call qio_channel_set_zerocopy(true) in the start of every multifd 
> > > > thread.
> > > > 
> > > > Change the send_write() interface of multifd, allowing it to pass down
> > > > flags for qio_channel_write*().
> > > > 
> > > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > > other data being sent at the default copying approach.
> > > > 
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >  migration/multifd-zlib.c | 7 ---
> > > >  migration/multifd-zstd.c | 7 ---
> > > >  migration/multifd.c  | 9 ++---
> > > >  migration/multifd.h  | 3 ++-
> > > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > > 
> > > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > > >  }
> > > >  
> > > >  if (used) {
> > > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > > _err);
> > > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > > MSG_ZEROCOPY,
> > > > +  _err);
> > > 
> > > I don't think it is valid to unconditionally enable this feature due to 
> > > the
> > > resource usage implications
> > > 
> > > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > > 
> > >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> > >if the socket option was not set, the socket exceeds its optmem 
> > >limit or the user exceeds its ulimit on locked pages."
> > > 
> > > The limit on locked pages is something that looks very likely to be
> > > exceeded unless you happen to be running a QEMU config that already
> > > implies locked memory (eg PCI assignment)
> > 
> > Yes it would be great to be a migration capability in parallel to multifd. 
> > At
> > initial phase if it's easy to be implemented on multi-fd only, we can add a
> > dependency between the caps.  In the future we can remove that dependency 
> > when
> > the code is ready to go without multifd.  Thanks,
> 
> Also, I'm wondering how zerocopy support interacts with kernel support
> for kTLS and multipath-TCP, both of which we want to be able to use
> with migration.

Copying Jason Wang for net implications between these features on kernel side
and whether they can be enabled together (MSG_ZEROCOPY, mptcp, kTLS).

>From the safe side we may want to only enable one of them until we prove
they'll work together I guess..

Not a immediate concern as I don't really think any of them is really
explicitly supported in qemu.

KTLS may be implicitly included by a new gnutls, but we need to mark TLS and
ZEROCOPY mutual exclusive anyway because at least the userspace TLS code of
gnutls won't has a way to maintain the tls buffers used by zerocopy.  So at
least we need some knob to detect whether kTLS is enabled in gnutls.

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-09-01 Thread Daniel P . Berrangé
On Tue, Aug 31, 2021 at 04:29:09PM -0400, Peter Xu wrote:
> On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > > Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> > > 
> > > Change the send_write() interface of multifd, allowing it to pass down
> > > flags for qio_channel_write*().
> > > 
> > > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > > other data being sent at the default copying approach.
> > > 
> > > Signed-off-by: Leonardo Bras 
> > > ---
> > >  migration/multifd-zlib.c | 7 ---
> > >  migration/multifd-zstd.c | 7 ---
> > >  migration/multifd.c  | 9 ++---
> > >  migration/multifd.h  | 3 ++-
> > >  4 files changed, 16 insertions(+), 10 deletions(-)
> > 
> > > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> > >  }
> > >  
> > >  if (used) {
> > > -ret = multifd_send_state->ops->send_write(p, used, 
> > > _err);
> > > +ret = multifd_send_state->ops->send_write(p, used, 
> > > MSG_ZEROCOPY,
> > > +  _err);
> > 
> > I don't think it is valid to unconditionally enable this feature due to the
> > resource usage implications
> > 
> > https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> > 
> >   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
> >if the socket option was not set, the socket exceeds its optmem 
> >limit or the user exceeds its ulimit on locked pages."
> > 
> > The limit on locked pages is something that looks very likely to be
> > exceeded unless you happen to be running a QEMU config that already
> > implies locked memory (eg PCI assignment)
> 
> Yes it would be great to be a migration capability in parallel to multifd. At
> initial phase if it's easy to be implemented on multi-fd only, we can add a
> dependency between the caps.  In the future we can remove that dependency when
> the code is ready to go without multifd.  Thanks,

Also, I'm wondering how zerocopy support interacts with kernel support
for kTLS and multipath-TCP, both of which we want to be able to use
with migration.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-08-31 Thread Peter Xu
On Tue, Aug 31, 2021 at 02:16:42PM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> > Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> > 
> > Change the send_write() interface of multifd, allowing it to pass down
> > flags for qio_channel_write*().
> > 
> > Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> > other data being sent at the default copying approach.
> > 
> > Signed-off-by: Leonardo Bras 
> > ---
> >  migration/multifd-zlib.c | 7 ---
> >  migration/multifd-zstd.c | 7 ---
> >  migration/multifd.c  | 9 ++---
> >  migration/multifd.h  | 3 ++-
> >  4 files changed, 16 insertions(+), 10 deletions(-)
> 
> > @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
> >  }
> >  
> >  if (used) {
> > -ret = multifd_send_state->ops->send_write(p, used, 
> > _err);
> > +ret = multifd_send_state->ops->send_write(p, used, 
> > MSG_ZEROCOPY,
> > +  _err);
> 
> I don't think it is valid to unconditionally enable this feature due to the
> resource usage implications
> 
> https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html
> 
>   "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
>if the socket option was not set, the socket exceeds its optmem 
>limit or the user exceeds its ulimit on locked pages."
> 
> The limit on locked pages is something that looks very likely to be
> exceeded unless you happen to be running a QEMU config that already
> implies locked memory (eg PCI assignment)

Yes it would be great to be a migration capability in parallel to multifd. At
initial phase if it's easy to be implemented on multi-fd only, we can add a
dependency between the caps.  In the future we can remove that dependency when
the code is ready to go without multifd.  Thanks,

-- 
Peter Xu




Re: [PATCH v1 3/3] migration: multifd: Enable zerocopy

2021-08-31 Thread Daniel P . Berrangé
On Tue, Aug 31, 2021 at 08:02:39AM -0300, Leonardo Bras wrote:
> Call qio_channel_set_zerocopy(true) in the start of every multifd thread.
> 
> Change the send_write() interface of multifd, allowing it to pass down
> flags for qio_channel_write*().
> 
> Pass down MSG_ZEROCOPY flag for sending memory pages, while keeping the
> other data being sent at the default copying approach.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  migration/multifd-zlib.c | 7 ---
>  migration/multifd-zstd.c | 7 ---
>  migration/multifd.c  | 9 ++---
>  migration/multifd.h  | 3 ++-
>  4 files changed, 16 insertions(+), 10 deletions(-)

> @@ -675,7 +676,8 @@ static void *multifd_send_thread(void *opaque)
>  }
>  
>  if (used) {
> -ret = multifd_send_state->ops->send_write(p, used, 
> _err);
> +ret = multifd_send_state->ops->send_write(p, used, 
> MSG_ZEROCOPY,
> +  _err);

I don't think it is valid to unconditionally enable this feature due to the
resource usage implications

https://www.kernel.org/doc/html/v5.4/networking/msg_zerocopy.html

  "A zerocopy failure will return -1 with errno ENOBUFS. This happens 
   if the socket option was not set, the socket exceeds its optmem 
   limit or the user exceeds its ulimit on locked pages."

The limit on locked pages is something that looks very likely to be
exceeded unless you happen to be running a QEMU config that already
implies locked memory (eg PCI assignment)


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|