On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> Like in the normal ram_load() path, put the received pages into the
> colo cache and mark the pages in the bitmap so that they will be
> flushed to the guest later.
> 
> Multifd with COLO is useful to reduce the VM pause time during checkpointing
> for latency sensitive workloads. In such workloads the worst-case latency
> is especially important.
> 
> Also, multifd migration is the preferred way to do migration nowadays and this
> allows to use multifd compression with COLO.
> 
> Benchmark:
> Cluster nodes
>  - Intel Xenon E5-2630 v3
>  - 48Gb RAM
>  - 10G Ethernet
> Guest
>  - Windows Server 2016
>  - 6Gb RAM
>  - 4 cores
> Workload
>  - Upload a file to the guest with SMB to simulate moderate
>    memory dirtying
>  - Measure the memory transfer time portion of each checkpoint
>  - 600ms COLO checkpoint interval
> 
> Results
> Plain
>  idle mean: 4.50ms 99per: 10.33ms
>  load mean: 24.30ms 99per: 78.05ms
> Multifd-4
>  idle mean: 6.48ms 99per: 10.41ms
>  load mean: 14.12ms 99per: 31.27ms

Thanks for the numbers.  They're persuasive at least from 1st look.

Said that, one major question is, multifd should only help with throughput
when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
any decent cpu should be able to push closer to 10Gbps even without
multifd.

Per my previous experiences, multifd can only show a difference when the
hosts have at least 25GBps+ bandwidth available.

Maybe you turned on compression already?  If so, worth stating the
compressor methods chosen / parameters.

> 
> Evaluation
> While multifd has slightly higher latency when the guest idles, it is
> 10ms faster under load and more importantly it's worst case latency is
> less than 1/2 of plain under load as can be seen in the 99. Percentile.
> 
> Signed-off-by: Juan Quintela <[email protected]>
> Signed-off-by: Lukas Straub <[email protected]>
> ---
>  MAINTAINERS              |  1 +
>  migration/meson.build    |  2 +-
>  migration/multifd-colo.c | 49 
> ++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/multifd-colo.h | 26 +++++++++++++++++++++++++
>  migration/multifd.c      | 12 ++++++++++++
>  migration/multifd.h      |  1 +
>  6 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 
> 563804345fec68ee72793dbb7c1b7e5be4c32083..dbb217255c2cf35dc0ce971c2021b130fac5469b
>  100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3837,6 +3837,7 @@ COLO Framework
>  M: Lukas Straub <[email protected]>
>  S: Maintained
>  F: migration/colo*
> +F: migration/multifd-colo.*
>  F: include/migration/colo.h
>  F: include/migration/failover.h
>  F: docs/COLO-FT.txt
> diff --git a/migration/meson.build b/migration/meson.build
> index 
> 16909d54c5110fc5d8187fd3a68c4a5b08b59ea7..1e59fe4f1f0bbfffed90df38e8f39fa87bceb9b9
>  100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -40,7 +40,7 @@ system_ss.add(files(
>  ), gnutls, zlib)
>  
>  if get_option('replication').allowed()
> -  system_ss.add(files('colo-failover.c', 'colo.c'))
> +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
>  else
>    system_ss.add(files('colo-stubs.c'))
>  endif
> diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..d8d98e79b12ed52c41f341052a682d7786e221b5
> --- /dev/null
> +++ b/migration/multifd-colo.c
> @@ -0,0 +1,49 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo implementation
> + *
> + * Copyright (c) Lukas Straub <[email protected]>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "exec/target_page.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "ram.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "io/channel-socket.h"
> +#include "migration/colo.h"
> +#include "multifd-colo.h"
> +#include "system/ramblock.h"
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> +{
> +    assert(p->block->colo_cache);
> +
> +    /*
> +     * While we're still in precopy state (not yet in colo state), we copy
> +     * received pages to both guest and cache. No need to set dirty bits,
> +     * since guest and cache memory are in sync.
> +     */
> +    if (migration_incoming_in_colo_state()) {
> +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> +    }
> +    p->host = p->block->colo_cache;

I should have mentioned it while reviewing the previous version, anyway..

IMHO it would be better to have one place setting p->host instead of
overwritting it.

So instead of hooking before ->recv(), we should do it in
multifd_ram_unfill_packet(), moving the p->host update to the end of
function and hook it there with COLO (so that you can still record the
bitmaps, only after normal[]).

Another thing, which might be more important: you seem to have ignored
zero[], but I think you need it.  zero[] keeps all pages that are zeros
(which may not used to be zeros).  So IMHO you'll need to record them as
dirty too in COLO's dest bitmap, otherwise you may hit hard to debug RAM
corruptions.

> +}
> +
> +void multifd_colo_process_recv(MultiFDRecvParams *p)
> +{
> +    if (!migration_incoming_in_colo_state()) {
> +        for (int i = 0; i < p->normal_num; i++) {
> +            void *guest = p->block->host + p->normal[i];
> +            void *cache = p->host + p->normal[i];
> +            memcpy(guest, cache, multifd_ram_page_size());
> +        }
> +    }
> +    p->host = p->block->host;

Is resetting the pointer required?  If not, we can skip it.

> +}
> diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> --- /dev/null
> +++ b/migration/multifd-colo.h
> @@ -0,0 +1,26 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo header
> + *
> + * Copyright (c) Lukas Straub <[email protected]>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> +#define QEMU_MIGRATION_MULTIFD_COLO_H
> +
> +#ifdef CONFIG_REPLICATION
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> +void multifd_colo_process_recv(MultiFDRecvParams *p);
> +
> +#else
> +
> +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> +
> +#endif
> +#endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 
> 8e71171fb7a17726ba7eb0705e293c41e8aa32ec..6c85acec3bac134e85cfcee0d32057134f5af8d1
>  100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -29,6 +29,7 @@
>  #include "qemu-file.h"
>  #include "trace.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "threadinfo.h"
>  #include "options.h"
>  #include "qemu/yank.h"
> @@ -1269,7 +1270,18 @@ static int multifd_ram_state_recv(MultiFDRecvParams 
> *p, Error **errp)
>  {
>      int ret;
>  
> +    if (migrate_colo()) {
> +        multifd_colo_prepare_recv(p);
> +    }
> +
>      ret = multifd_recv_state->ops->recv(p, errp);
> +    if (ret != 0) {
> +        return ret;
> +    }
> +
> +    if (migrate_colo()) {
> +        multifd_colo_process_recv(p);
> +    }
>  
>      return ret;
>  }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 
> 9b6d81e7ede024f05d4cd235de95e73840d0bbc4..7036f438fade1baed2442bfdcf8b5d6397c4a448
>  100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -280,6 +280,7 @@ typedef struct {
>      /* ramblock */
>      RAMBlock *block;
>      /* ramblock host address */
> +    /* or points to the corresponding address in the colo cache */

Nit: we can merge it with /* ... */, and some wording change:

       /*
        * Normally, it points to ramblock's host address.  When COLO
        * enabled, it points to the mirror cache for the ramblock.
        */

>      uint8_t *host;
>      /* buffers to recv */
>      struct iovec *iov;
> 
> -- 
> 2.39.5
> 

-- 
Peter Xu


Reply via email to