This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make use of it for multifd migration performance improvement, by reducing cpu usage.
Patch #1 conditionally disables liburing in systems where linux/errqueue.h conflict with liburing/compat.h (__kernel_timespec redefine) Patch #2 creates new callbacks for QIOChannel, allowing the implementation of zero copy writing. Patch #3 implements io_writev flags and io_flush() on QIOChannelSocket, making use of MSG_ZEROCOPY on Linux. Patch #4 adds a "zero_copy_send" migration property, only available with CONFIG_LINUX, and compiled-out in any other architectures. This migration property has to be enabled before multifd migration starts. Patch #5 adds a helper function that allows to see if TLS is going to be used. This helper will be later used in patch #5. Patch #6 changes multifd_send_sync_main() so it returns int instead of void. The return value is used to understand if any error happened in the function, allowing migration to possible fail earlier. Patch #7 implements an workaround: The behavior introduced in d48c3a0445 is hard to deal with in zerocopy, so a workaround is introduced to send the header in a different syscall, without MSG_ZEROCOPY. Patch #8 Makes use of QIOChannelSocket zero_copy implementation on nocomp multifd migration. Results: In preliminary tests, the resource usage of __sys_sendmsg() reduced 15 times, and the overall migration took 13-22% less time, based in synthetic cpu workload. In further tests, it was noted that, on multifd migration with 8 channels: - On idle hosts, migration time reduced in 10% to 21%. - On hosts busy with heavy cpu stress (1 stress thread per cpu, but not cpu-pinned) migration time reduced in ~25% by enabling zero-copy. - On hosts with heavy cpu-pinned workloads (1 stress thread per cpu, cpu-pinned), migration time reducted in ~66% by enabling zero-copy. Above tests setup: - Sending and Receiving hosts: - CPU : Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz (448 CPUS) - Network card: E810-C (100Gbps) - >1TB RAM - QEMU: Upstream master branch + This patchset - Linux: Upstream v5.15 - VM configuration: - 28 VCPUs - 512GB RAM --- Changes since v12: - New patch #1 added to solve an issue with Gitlab CI on alpine - Removed unnecessary newline in patch #3 (previously #2) - Removed incorrect commit change in roms/skiboot on patch #4 (previously #3) Changes since v11: - Patch #3 now wrap lines around column 75 - Patch #2 now introduce some #ifdefs instead of defining a default value for MSG_ZEROCOPY and SO_ZEROCOPY Changes since v10: - Patch #2 was breaking build on systems with glibc < glibc-2.27, and probably non-linux builds. - Also on Patch #2, replaced bits/socket.h with sys/socket.h, (thanks Peter Xu) Changes since v9: - Patch #6 got simplified and improved (thanks Daniel) - Patch #7 got better comments (thanks Peter Xu) Changes since v8: - Inserted two new patches #5 & #6, previous patch #5 is now #7. - Workaround an optimization introduced in d48c3a0445 - Removed unnecessary assert in qio_channel_writev_full_all Changes since v7: - Migration property renamed from zero-copy to zero-copy-send - A few early tests added to help misconfigurations to fail earlier - qio_channel_full*_flags() renamed back to qio_channel_full*() - multifd_send_sync_main() reverted back to not receiving a flag, so it always sync zero-copy when enabled. - Improve code quality on a few points Changes since v6: - Remove io_writev_zero_copy(), and makes use of io_writev() new flags to achieve the same results. - Rename io_flush_zero_copy() to io_flush() - Previous patch #2 became too small, so it was squashed in previous patch #3 (now patch #2) Changes since v5: - flush_zero_copy now returns -1 on fail, 0 on success, and 1 when all processed writes were not able to use zerocopy in kernel. - qio_channel_socket_poll() removed, using qio_channel_wait() instead - ENOBUFS is now processed inside qio_channel_socket_writev_flags() - Most zerocopy parameter validation moved to migrate_params_check(), leaving only feature test to socket_outgoing_migration() callback - Naming went from *zerocopy to *zero_copy or *zero-copy, due to QAPI/QMP preferences - Improved docs Changes since v4: - 3 patches got splitted in 6 - Flush is used for syncing after each iteration, instead of only at the end - If zerocopy is not available, fail in connect instead of failing on write - 'multifd-zerocopy' property renamed to 'zerocopy' - Fail migrations that don't support zerocopy, if it's enabled. - Instead of checking for zerocopy at each write, save the flags in MultiFDSendParams->write_flags and use them on write - Reorganized flag usage in QIOChannelSocket - A lot of typos fixed - More doc on buffer restrictions Changes since v3: - QIOChannel interface names changed from io_async_{writev,flush} to io_{writev,flush}_zerocopy - Instead of falling back in case zerocopy is not implemented, return error and abort operation. - Flush now waits as long as needed, or return error in case anything goes wrong, aborting the operation. - Zerocopy is now conditional in multifd, being set by parameter multifd-zerocopy - Moves zerocopy_flush to multifd_send_sync_main() from multifd_save_cleanup so migration can abort if flush goes wrong. - Several other small improvements Changes since v2: - Patch #1: One more fallback - Patch #2: Fall back to sync if fails to lock buffer memory in MSG_ZEROCOPY send. Changes since v1: - Reimplemented the patchset using async_write + async_flush approach. - Implemented a flush to be able to tell whenever all data was written. Leonardo Bras (8): meson.build: Fix docker-test-build@alpine when including linux/errqueue.h QIOChannel: Add flags on io_writev and introduce io_flush callback QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX migration: Add zero-copy-send parameter for QMP/HMP for Linux migration: Add migrate_use_tls() helper multifd: multifd_send_sync_main now returns negative on error multifd: Send header packet without flags if zero-copy-send is enabled multifd: Implement zero copy write in multifd migration (multifd-zero-copy) meson.build | 11 +++ qapi/migration.json | 24 ++++++ include/io/channel-socket.h | 2 + include/io/channel.h | 38 ++++++++- migration/migration.h | 6 ++ migration/multifd.h | 4 +- chardev/char-io.c | 2 +- hw/remote/mpqemu-link.c | 2 +- io/channel-buffer.c | 1 + io/channel-command.c | 1 + io/channel-file.c | 1 + io/channel-socket.c | 118 +++++++++++++++++++++++++++- io/channel-tls.c | 1 + io/channel-websock.c | 1 + io/channel.c | 49 +++++++++--- migration/channel.c | 3 +- migration/migration.c | 52 +++++++++++- migration/multifd.c | 74 ++++++++++++++--- migration/ram.c | 29 +++++-- migration/rdma.c | 1 + migration/socket.c | 12 ++- monitor/hmp-cmds.c | 6 ++ scsi/pr-manager-helper.c | 2 +- tests/unit/test-io-channel-socket.c | 1 + 24 files changed, 397 insertions(+), 44 deletions(-) -- 2.36.1