Zero-copy multifd migration sends both the header and the memory pages in a single syscall. Since it's necessary to flush before reusing the header, a header array was implemented, so each write call uses a different array, and flushing only take place after all headers have been used, meaning 1 flush for each N writes.
This method has a bottleneck, though: After the last write, a flush will have to wait for all writes to finish, which will be a lot, meaning the recvmsg() syscall called in qio_channel_socket_flush() will be called a lot. On top of that, it will create a time period when the I/O queue is empty and nothing is getting send: between the flush and the next write. To avoid that, use qio_channel_flush()'s new max_pending parameter to wait until at most half of the array is still in use. (i.e. the LRU half of the array can be reused) Flushing for the LRU half of the array is much faster, since it does not have to wait for the most recent writes to finish, making up for having to flush twice per array. As a main benefit, this approach keeps the I/O queue from being empty while there are still data to be sent, making it easier to keep the I/O maximum throughput while consuming less cpu time. Signed-off-by: Leonardo Bras <leob...@redhat.com> --- migration/multifd.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index c5d1f911a4..fe9df460f6 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -569,12 +569,13 @@ void multifd_save_cleanup(void) multifd_send_state = NULL; } -static int multifd_zero_copy_flush(QIOChannel *c) +static int multifd_zero_copy_flush(QIOChannel *c, + int max_remaining) { int ret; Error *err = NULL; - ret = qio_channel_flush(c, 0, &err); + ret = qio_channel_flush(c, max_remaining, &err); if (ret < 0) { error_report_err(err); return -1; @@ -636,7 +637,7 @@ int multifd_send_sync_main(QEMUFile *f) qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); - if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) { + if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c, 0) < 0)) { return -1; } } @@ -719,12 +720,17 @@ static void *multifd_send_thread(void *opaque) if (use_zero_copy_send) { p->packet_idx = (p->packet_idx + 1) % HEADER_ARR_SZ; - - if (!p->packet_idx && (multifd_zero_copy_flush(p->c) < 0)) { + /* + * When half the array have been used, flush to make sure the + * next half is available + */ + if (!(p->packet_idx % (HEADER_ARR_SZ / 2)) && + (multifd_zero_copy_flush(p->c, HEADER_ARR_SZ / 2) < 0)) { break; } header = (void *)p->packet + p->packet_idx * p->packet_len; } + qemu_mutex_lock(&p->mutex); p->pending_job--; qemu_mutex_unlock(&p->mutex); -- 2.38.0