On Thu, May 5, 2022 at 5:05 AM Daniel P. Berrangé <berra...@redhat.com> wrote: > > On Wed, May 04, 2022 at 04:18:31PM -0300, Leonardo Bras wrote: > > For CONFIG_LINUX, implement the new zero copy flag and the optional callback > > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY > > feature is available in the host kernel, which is checked on > > qio_channel_socket_connect_sync() > > > > qio_channel_socket_flush() was implemented by counting how many times > > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the > > socket's error queue, in order to find how many of them finished sending. > > Flush will loop until those counters are the same, or until some error > > occurs. > > > > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY: > > 1: Buffer > > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid > > copying, > > some caution is necessary to avoid overwriting any buffer before it's sent. > > If something like this happen, a newer version of the buffer may be sent > > instead. > > - If this is a problem, it's recommended to call qio_channel_flush() before > > freeing > > or re-using the buffer. > > > > 2: Locked memory > > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, > > and > > unlocked after it's sent. > > - Depending on the size of each buffer, and how often it's sent, it may > > require > > a larger amount of locked memory than usually available to non-root user. > > - If the required amount of locked memory is not available, writev_zero_copy > > will return an error, which can abort an operation like migration, > > - Because of this, when an user code wants to add zero copy as a feature, it > > requires a mechanism to disable it, so it can still be accessible to less > > privileged users. > > > > Signed-off-by: Leonardo Bras <leob...@redhat.com> > > Reviewed-by: Peter Xu <pet...@redhat.com> > > Reviewed-by: Daniel P. Berrangé <berra...@redhat.com> > > Reviewed-by: Juan Quintela <quint...@redhat.com> > > --- > > include/io/channel-socket.h | 2 + > > io/channel-socket.c | 120 ++++++++++++++++++++++++++++++++++-- > > 2 files changed, 118 insertions(+), 4 deletions(-) > > > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h > > index e747e63514..513c428fe4 100644 > > --- a/include/io/channel-socket.h > > +++ b/include/io/channel-socket.h > > @@ -47,6 +47,8 @@ struct QIOChannelSocket { > > socklen_t localAddrLen; > > struct sockaddr_storage remoteAddr; > > socklen_t remoteAddrLen; > > + ssize_t zero_copy_queued; > > + ssize_t zero_copy_sent; > > }; > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c > > index 696a04dc9c..ae756ce166 100644 > > --- a/io/channel-socket.c > > +++ b/io/channel-socket.c > > @@ -25,9 +25,25 @@ > > #include "io/channel-watch.h" > > #include "trace.h" > > #include "qapi/clone-visitor.h" > > +#ifdef CONFIG_LINUX > > +#include <linux/errqueue.h> > > +#include <sys/socket.h> > > +#endif > > > > #define SOCKET_MAX_FDS 16 > > > > +/* > > + * Zero-copy defines bellow are included to avoid breaking builds on > > systems > > + * that don't support MSG_ZEROCOPY, while keeping the functions more > > readable > > + * (without a lot of ifdefs). > > + */ > > +#ifndef MSG_ZEROCOPY > > +#define MSG_ZEROCOPY 0x4000000 > > +#endif > > +#ifndef SO_ZEROCOPY > > +#define SO_ZEROCOPY 60 > > +#endif > > Please put these behind CONFIG_LINUX to make it clear to readers that > this is entirely Linux specific > > > > + > > SocketAddress * > > qio_channel_socket_get_local_address(QIOChannelSocket *ioc, > > Error **errp) > > @@ -54,6 +70,8 @@ qio_channel_socket_new(void) > > > > sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); > > sioc->fd = -1; > > + sioc->zero_copy_queued = 0; > > + sioc->zero_copy_sent = 0; > > > > ioc = QIO_CHANNEL(sioc); > > qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); > > @@ -153,6 +171,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket > > *ioc, > > return -1; > > } > > > > +#ifdef CONFIG_LINUX > > + int ret, v = 1; > > + ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); > > + if (ret == 0) { > > + /* Zero copy available on host */ > > + qio_channel_set_feature(QIO_CHANNEL(ioc), > > + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); > > + } > > +#endif > > + > > return 0; > > } > > > > @@ -533,6 +561,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel > > *ioc, > > char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)]; > > size_t fdsize = sizeof(int) * nfds; > > struct cmsghdr *cmsg; > > + int sflags = 0; > > > > memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)); > > > > @@ -557,15 +586,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel > > *ioc, > > memcpy(CMSG_DATA(cmsg), fds, fdsize); > > } > > > > + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { > > + sflags = MSG_ZEROCOPY; > > + } > > Also should be behind CONFIG_LINUX
Hello Daniel, But what if this gets compiled in a Linux system without MSG_ZEROCOPY support? As qapi will have zero-copy-send as an option we could have this scenario: - User request migration using zero-copy-send - multifd_save_setup() will set write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY - In qio_channel_socket_connect_sync(): setsockopt() part will be compiled-out, so no error here - above part in qio_channel_socket_writev() will be commented-out, which means write_flags will be ignored - sflags will not contain MSG_ZEROCOPY, so sendmsg() will use copy-mode - migration will succeed In the above case, the user has all the reason to think migration is using MSG_ZEROCOPY, but in fact it's quietly falling back to copy-mode. That's why I suggested creating a 'global' config usiing SO_ZEROCOPY & MSG_ZEROCOPY & CONFIG_LINUX so we can use in qapi and have no chance of even offering zero-copy-send if we don't have it. Another local option is to do implement your suggestions, and also change qio_channel_socket_connect_sync() so it returns an error if MSG_ZEROCOPY && SO_ZEROCOPY is not present, such as: +#ifdef CONFIG_LINUX +#if defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY) + int ret, v = 1; + ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); + if (ret == 0) { + /* Zero copy available on host */ + qio_channel_set_feature(QIO_CHANNEL(ioc), + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); + } +#else + error_setg_errno(errp, errno,"MSG_ZEROCOPY not available"); + return -1; +#endif +#endif What do you think? Best regards, Leo > > > + > > retry: > > - ret = sendmsg(sioc->fd, &msg, 0); > > + ret = sendmsg(sioc->fd, &msg, sflags); > > if (ret <= 0) { > > - if (errno == EAGAIN) { > > + switch (errno) { > > + case EAGAIN: > > return QIO_CHANNEL_ERR_BLOCK; > > - } > > - if (errno == EINTR) { > > + case EINTR: > > goto retry; > > + case ENOBUFS: > > + if (sflags & MSG_ZEROCOPY) { > > + error_setg_errno(errp, errno, > > + "Process can't lock enough memory for > > using MSG_ZEROCOPY"); > > + return -1; > > + } > > + break; > > And this ENOBUGS case behind CONFIG_LINUX > [...]