Peter Xu <[email protected]> writes:

> On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:
>> Lukas Straub <[email protected]> writes:
>> 
>> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
>> >> > +{
>> >> > +    /*
>> >> > +     * While we're still in precopy state (not yet in colo state), we 
>> >> > copy
>> >> > +     * received pages to both guest and cache. No need to set dirty 
>> >> > bits,
>> >> > +     * since guest and cache memory are in sync.
>> >> > +     */
>> >> > +    if (migration_incoming_in_colo_state()) {  
>> >> 
>> >> What's the relationship between migration_incoming_in_colo_state() and
>> >> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
>> >> migration_incoming_colo_enabled affect multifd as well?
>> >
>> > So first off migration_incoming_colo_enabled() and migrate_colo()
>> > are equivalent in practice since
>> > 121ccedc2b migration: block incoming colo when capability is disabled
>> >
>> > (I have some cleanup patches lying around, but that will be for later)
>> >
>> 
>> Ok, I think those are important because when having multifd and
>> non-multifd code for the same feature, it's useful to be able to compare
>> the two. So some degree of uniformity would be nice.
>
> I second.  We can drop those in this series before adding multifd support,
> likely together with MIG_CMD_ENABLE_COLO as well; I don't think COLO needs
> to worry about old binaries.  It should always use the same QEMU binary on
> both sides.
>
> The patch needs to rename MIG_CMD_ENABLE_COLO to MIG_CMD_DEPRECATED_0 or
> something, to make the rest MIG_CMD compatible to old binaries, though.
>
>> 
>> > For colo, we do normal precopy migration and at the end we go into colo
>> > state and then
>> > migration_incoming_in_colo_state() will be true.
>> >
>> > Here we check migrate_colo() outside of these functions as Peter
>> > requested that in a previous version.
>> >
>> >> 
>> >> The multifd recv threads will be running until after
>> >> process_incoming_migration_bh(), which is when
>> >> migration_incoming_disable_colo() runs.
>> >
>> > That is not an issue as we use migrate_colo() here.
>> >
>> >> 
>> >> Also, is the colo_cache guaranteed to be around until multifd threads
>> >> exit?
>> >
>> > This is an issue. I will fix it in the next version.
>> >
>> >>
>> >> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
>> >> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
>> >> > +    }
>> >> > +}
>> >> > +
>> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
>> >> > +{
>> >> > +    if (!migration_incoming_in_colo_state()) {
>> >> > +        for (int i = 0; i < p->normal_num; i++) {
>> >> > +            void *guest = p->block->host + p->normal[i];
>> >> > +            void *cache = p->host + p->normal[i];
>> >> > +            memcpy(guest, cache, multifd_ram_page_size());
>> >> > +        }  
>> >> 
>> >> I see some differences between what ram.c does and what multifd will do
>> >> after this patch regarding which flags are checked and order of copies
>> >> (code below):
>> >> 
>> >> ram.c:
>> >> 
>> >>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
>> >>   Reads from stream into colo_cache.
>> >>   
>> >>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
>> >>   Reads from stream into guest and then memcpy into colo_cache.
>> >> 
>> >>   - !migration_incoming_colo_enabled
>> >>   Reads from stream into guest.
>> >> 
>> >> multifd.c:
>> >> 
>> >>   - migrate_colo:
>> >>   Reads from stream into colo_cache.
>> >>   
>> >>   - !migration_incoming_in_colo_state:
>> >>   memcpy from colo_cache into guest.
>> >> 
>> >>   - !migration_incoming_colo_enabled
>> >>   ???
>> >> 
>> >> The resulting state should be the same, but I wonder if we want to i) use
>> >> the same checks in multifd
>> >
>> > migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
>> > not using it here. migrate_colo() is much easier to reason about.
>> >
>> >> and ii) when not in colo state, copy first
>> >> into guest (using readv) and later memcpy into the colo_cache.
>> >
>> > I think it is easier the way it is now.
>> >
>> >> 
>> >> ---
>> >> ram.c:
>> >> 
>> >> host = host_from_ram_block_offset(block, addr);
>> >> if (migration_incoming_colo_enabled()) {
>> >>     if (migration_incoming_in_colo_state()) {
>> >>         host = colo_cache_from_block_offset(block, addr, true);
>> >>     } else {
>> >>         host_bak = colo_cache_from_block_offset(block, addr, false);
>> >>     }
>> >> }
>> >> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>> >> if (host_bak) {
>> >>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
>> >> }
>> >> 
>> >> multifd:
>> >> 
>> >> if (migrate_colo()) {
>> >>     p->host = p->block->colo_cache;
>> >> }
>> >> 
>> >> for (int i = 0; i < p->normal_num; i++) {
>> >>     p->iov[i].iov_base = p->host + p->normal[i];
>> >> }
>> >> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> >> 
>> >> if (!migration_incoming_in_colo_state()) {
>> >>     for (int i = 0; i < p->normal_num; i++) {
>> >>         void *guest = p->block->host + p->normal[i];
>> >>         void *cache = p->host + p->normal[i];
>> >>         memcpy(guest, cache, multifd_ram_page_size());
>> >>     }
>> >> }
>> >> ---
>> >> 
>> >> > +        for (int i = 0; i < p->zero_num; i++) {
>> >> > +            void *guest = p->block->host + p->zero[i];
>> >> > +            memset(guest, 0, multifd_ram_page_size());
>> >> > +        }  
>> >> 
>> >> At multifd_nocomp_recv, there will be a call to
>> >> multifd_recv_zero_page_process(), which by that point will have p->host
>> >> == p->block->colo_cache, so it looks like that function will do some
>> >> zero page processing in the colo_cache, setting the rb->receivedmap for
>> >> pages in the colo_cache and potentially also doing a memcpy. Is this
>> >> intended?
>> >
>> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
>> > colo.
>> >
>> 
>> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
>> page causing multiple page faults"). So it seems we might be doing extra
>> work on top of the colo_cache.
>
> IIUC not extra, but exactly what will be needed.
>
> The logic was about "in a vanilla precopy, if we see one page arriving the
> 1st time we don't need to zero the buffer because the buffer should be zero
> allocated".
>
> In COLO's case, COLO always puts RAM data into colo_cache, hence it should
> apply to colo_cache too, avoiding unnecessary memset() for colo_cache
> instead.
>
> E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
> guaranteed to be zeros when never touched.
>
>> 
>> >> 
>> >> I'm thinking that maybe it would overall be better to hook colo directly
>> >> in to multifd_nocomp_recv:
>> >
>> > But then it will only work for nocomp, right? It feels like the wrong
>> > level of abstraction to me.
>> >
>> 
>> Ah, nocomp != ram indeed.
>> 
>> >> 
>> >> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
>> >> {
>> >>     uint32_t flags;
>> >> 
>> >>     if (migrate_mapped_ram()) {
>> >>         return multifd_file_recv_data(p, errp);
>> >>     }
>> >> 
>> >>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> >> 
>> >>     if (flags != MULTIFD_FLAG_NOCOMP) {
>> >>         error_setg(errp, "multifd %u: flags received %x flags expected 
>> >> %x",
>> >>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
>> >>         return -1;
>> >>     }
>> >> 
>> >> +    if (migration_incoming_colo_enabled() && 
>> >> migration_incoming_in_colo_state()) {
>> >> +        p->host = p->block->colo_cache;
>> >> +    } // or else{}, depending on how deal with zero pages in the cache
>> >> 
>> >>     multifd_recv_zero_page_process(p);
>> >> 
>> >>     if (!p->normal_num) {
>> >>         return 0;
>> >>     }
>> >> 
>> >>     for (int i = 0; i < p->normal_num; i++) {
>> >>         p->iov[i].iov_base = p->host + p->normal[i];
>> >>         p->iov[i].iov_len = multifd_ram_page_size();
>> >>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
>> >>     }
>> >> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> >> +    if (ret != 0) {
>> >> +        return ret;
>> >> +    }
>> >> +
>> >> +    if (migration_incoming_colo_enabled()) {
>> >> +        multifd_colo_process_recv();
>> >> +    }
>> >> 
>> >>     return ret;
>> >> }
>> >> 
>> >> 
>> >> > +    }
>> >> > +}
>> >> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
>> >> > new file mode 100644
>> >> > index 
>> >> > 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
>> >> > --- /dev/null
>> >> > +++ b/migration/multifd-colo.h
>> >> > @@ -0,0 +1,26 @@
>> >> > +/*
>> >> > + * SPDX-License-Identifier: GPL-2.0-or-later
>> >> > + *
>> >> > + * multifd colo header
>> >> > + *
>> >> > + * Copyright (c) Lukas Straub <[email protected]>
>> >> > + *
>> >> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
>> >> > later.
>> >> > + * See the COPYING file in the top-level directory.
>> >> > + */
>> >> > +
>> >> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
>> >> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
>> >> > +
>> >> > +#ifdef CONFIG_REPLICATION
>> >> > +
>> >> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
>> >> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
>> >> > +
>> >> > +#else
>> >> > +
>> >> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
>> >> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
>> >> > +
>> >> > +#endif
>> >> > +#endif
>> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
>> >> > index 
>> >> > 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0
>> >> >  100644
>> >> > --- a/migration/multifd-nocomp.c
>> >> > +++ b/migration/multifd-nocomp.c
>> >> > @@ -16,6 +16,7 @@
>> >> >  #include "file.h"
>> >> >  #include "migration-stats.h"
>> >> >  #include "multifd.h"
>> >> > +#include "multifd-colo.h"
>> >> >  #include "options.h"
>> >> >  #include "migration.h"
>> >> >  #include "qapi/error.h"
>> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, 
>> >> > Error **errp)
>> >> >          return -1;
>> >> >      }
>> >> >  
>> >> > -    p->host = p->block->host;
>> >> >      for (i = 0; i < p->normal_num; i++) {
>> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
>> >> >  
>> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams 
>> >> > *p, Error **errp)
>> >> >          p->zero[i] = offset;
>> >> >      }
>> >> >  
>> >> > +    if (migrate_colo()) {
>> >> > +        multifd_colo_prepare_recv(p);
>> >> > +        assert(p->block->colo_cache);
>> >> > +        p->host = p->block->colo_cache;  
>> >> 
>> >> Can't you just use p->block->colo_cache later? I don't see why p->host
>> >> needs to be set beforehand even in the non-colo case.
>> >
>> > We should not touch the guest ram directly while in colo state, since
>> > the incoming guest is running and we either want to receive and apply a
>> > whole checkpoint with all ram into colo cache and all device state,
>> > or if anything goes wrong during checkpointing, keep the currently
>> > running guest on the incoming side in pristine state.
>> >
>> 
>> I was asking about setting p->host at this specific point. I don't think
>> any of this fits the unfill function. However, I see those were
>> suggested by Peter so let's not go back and forth.
>
> Actually I don't know why p->host existed before this work; IIUC we could
> have always used p->block->host.  Maybe when Juan was developing this Juan
> kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> reference.
>

Maybe p->block was being reset at some point and p->host was passed
being the point where the (whatever) lock was release. I checked and
today there's no such thing. The p->mutex seems to be there just to
protect against this in multifd_recv_sync_main:

WITH_QEMU_LOCK_GUARD(&p->mutex) {
    if (multifd_recv_state->packet_num < p->packet_num) {
        multifd_recv_state->packet_num = p->packet_num;
    }
}

> IIUC, we could remove p->host, but when we need to access "the buffer of
> the ramblock" we'll need to call a helper to fetch that (either ramblock's
> buffer, or colo_cache, per migrate_colo()).  And it might be slightly
> slower than p->host indeed.
>

Yeah, let's keep it, the compression code also uses it, there's no point
removing it now.

>> 
>> > I have written more about colo migration here:
>> >
>> > https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
>> > https://lore.kernel.org/qemu-devel/[email protected]/
>> >
>> >> 
>> >> > +    } else {
>> >> > +        p->host = p->block->host;
>> >> > +    }
>> >> > +
>> >> >      return 0;
>> >> >  }
>> >> >  
>> >> > diff --git a/migration/multifd.c b/migration/multifd.c
>> >> > index 
>> >> > 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c
>> >> >  100644
>> >> > --- a/migration/multifd.c
>> >> > +++ b/migration/multifd.c
>> >> > @@ -29,6 +29,7 @@
>> >> >  #include "qemu-file.h"
>> >> >  #include "trace.h"
>> >> >  #include "multifd.h"
>> >> > +#include "multifd-colo.h"
>> >> >  #include "options.h"
>> >> >  #include "qemu/yank.h"
>> >> >  #include "io/channel-file.h"
>> >> > @@ -1258,6 +1259,13 @@ static int 
>> >> > multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>> >> >      int ret;
>> >> >  
>> >> >      ret = multifd_recv_state->ops->recv(p, errp);
>> >> > +    if (ret != 0) {
>> >> > +        return ret;
>> >> > +    }
>> >> > +
>> >> > +    if (migrate_colo()) {
>> >> > +        multifd_colo_process_recv(p);
>> >> > +    }  
>> >> 
>> >> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
>> >> the latter is more appropriate as we have mapped_ram already in
>> >> there. Let's drop patch 3 and put this in multifd_nocomp_recv().
>> >
>> > Again, it also should work with compression and multifd_nocomp_recv()
>> > is for nocomp only.
>> >
>> >> 
>> >> >  
>> >> >      return ret;
>> >> >  }
>> >> > diff --git a/migration/multifd.h b/migration/multifd.h
>> >> > index 
>> >> > 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b
>> >> >  100644
>> >> > --- a/migration/multifd.h
>> >> > +++ b/migration/multifd.h
>> >> > @@ -279,7 +279,10 @@ typedef struct {
>> >> >      uint64_t packets_recved;
>> >> >      /* ramblock */
>> >> >      RAMBlock *block;
>> >> > -    /* ramblock host address */
>> >> > +    /*
>> >> > +     * Normally, it points to ramblock's host address.  When COLO
>> >> > +     * is enabled, it points to the mirror cache for the ramblock.
>> >> > +     */
>> >> >      uint8_t *host;
>> >> >      /* buffers to recv */
>> >> >      struct iovec *iov;  
>> 

Reply via email to