Lukas Straub <[email protected]> writes:

>> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
>> > +{
>> > +    /*
>> > +     * While we're still in precopy state (not yet in colo state), we copy
>> > +     * received pages to both guest and cache. No need to set dirty bits,
>> > +     * since guest and cache memory are in sync.
>> > +     */
>> > +    if (migration_incoming_in_colo_state()) {  
>> 
>> What's the relationship between migration_incoming_in_colo_state() and
>> migration_incoming_colo_enabled()? ram_load_precopy() checks both. Would
>> migration_incoming_colo_enabled affect multifd as well?
>
> So first off migration_incoming_colo_enabled() and migrate_colo()
> are equivalent in practice since
> 121ccedc2b migration: block incoming colo when capability is disabled
>
> (I have some cleanup patches lying around, but that will be for later)
>

Ok, I think those are important because when having multifd and
non-multifd code for the same feature, it's useful to be able to compare
the two. So some degree of uniformity would be nice.

> For colo, we do normal precopy migration and at the end we go into colo
> state and then
> migration_incoming_in_colo_state() will be true.
>
> Here we check migrate_colo() outside of these functions as Peter
> requested that in a previous version.
>
>> 
>> The multifd recv threads will be running until after
>> process_incoming_migration_bh(), which is when
>> migration_incoming_disable_colo() runs.
>
> That is not an issue as we use migrate_colo() here.
>
>> 
>> Also, is the colo_cache guaranteed to be around until multifd threads
>> exit?
>
> This is an issue. I will fix it in the next version.
>
>>
>> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
>> > +        colo_record_bitmap(p->block, p->zero, p->zero_num);
>> > +    }
>> > +}
>> > +
>> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
>> > +{
>> > +    if (!migration_incoming_in_colo_state()) {
>> > +        for (int i = 0; i < p->normal_num; i++) {
>> > +            void *guest = p->block->host + p->normal[i];
>> > +            void *cache = p->host + p->normal[i];
>> > +            memcpy(guest, cache, multifd_ram_page_size());
>> > +        }  
>> 
>> I see some differences between what ram.c does and what multifd will do
>> after this patch regarding which flags are checked and order of copies
>> (code below):
>> 
>> ram.c:
>> 
>>   - migration_incoming_colo_enabled && migration_incoming_in_colo_state:
>>   Reads from stream into colo_cache.
>>   
>>   - migration_incoming_colo_enabled && !migration_incoming_in_colo_state:
>>   Reads from stream into guest and then memcpy into colo_cache.
>> 
>>   - !migration_incoming_colo_enabled
>>   Reads from stream into guest.
>> 
>> multifd.c:
>> 
>>   - migrate_colo:
>>   Reads from stream into colo_cache.
>>   
>>   - !migration_incoming_in_colo_state:
>>   memcpy from colo_cache into guest.
>> 
>>   - !migration_incoming_colo_enabled
>>   ???
>> 
>> The resulting state should be the same, but I wonder if we want to i) use
>> the same checks in multifd
>
> migration_incoming_colo_enabled() shouldn't even exist anymore, so I'm
> not using it here. migrate_colo() is much easier to reason about.
>
>> and ii) when not in colo state, copy first
>> into guest (using readv) and later memcpy into the colo_cache.
>
> I think it is easier the way it is now.
>
>> 
>> ---
>> ram.c:
>> 
>> host = host_from_ram_block_offset(block, addr);
>> if (migration_incoming_colo_enabled()) {
>>     if (migration_incoming_in_colo_state()) {
>>         host = colo_cache_from_block_offset(block, addr, true);
>>     } else {
>>         host_bak = colo_cache_from_block_offset(block, addr, false);
>>     }
>> }
>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
>> if (host_bak) {
>>     memcpy(host_bak, host, TARGET_PAGE_SIZE);
>> }
>> 
>> multifd:
>> 
>> if (migrate_colo()) {
>>     p->host = p->block->colo_cache;
>> }
>> 
>> for (int i = 0; i < p->normal_num; i++) {
>>     p->iov[i].iov_base = p->host + p->normal[i];
>> }
>> return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> 
>> if (!migration_incoming_in_colo_state()) {
>>     for (int i = 0; i < p->normal_num; i++) {
>>         void *guest = p->block->host + p->normal[i];
>>         void *cache = p->host + p->normal[i];
>>         memcpy(guest, cache, multifd_ram_page_size());
>>     }
>> }
>> ---
>> 
>> > +        for (int i = 0; i < p->zero_num; i++) {
>> > +            void *guest = p->block->host + p->zero[i];
>> > +            memset(guest, 0, multifd_ram_page_size());
>> > +        }  
>> 
>> At multifd_nocomp_recv, there will be a call to
>> multifd_recv_zero_page_process(), which by that point will have p->host
>> == p->block->colo_cache, so it looks like that function will do some
>> zero page processing in the colo_cache, setting the rb->receivedmap for
>> pages in the colo_cache and potentially also doing a memcpy. Is this
>> intended?
>
> rb->receivedmap is only for postcopy, right? So it doesn't apply with
> colo.
>

It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
page causing multiple page faults"). So it seems we might be doing extra
work on top of the colo_cache.

>> 
>> I'm thinking that maybe it would overall be better to hook colo directly
>> in to multifd_nocomp_recv:
>
> But then it will only work for nocomp, right? It feels like the wrong
> level of abstraction to me.
>

Ah, nocomp != ram indeed.

>> 
>> static int multifd_nocomp_recv(MultiFDRecvParams *p, Error **errp)
>> {
>>     uint32_t flags;
>> 
>>     if (migrate_mapped_ram()) {
>>         return multifd_file_recv_data(p, errp);
>>     }
>> 
>>     flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> 
>>     if (flags != MULTIFD_FLAG_NOCOMP) {
>>         error_setg(errp, "multifd %u: flags received %x flags expected %x",
>>                    p->id, flags, MULTIFD_FLAG_NOCOMP);
>>         return -1;
>>     }
>> 
>> +    if (migration_incoming_colo_enabled() && 
>> migration_incoming_in_colo_state()) {
>> +        p->host = p->block->colo_cache;
>> +    } // or else{}, depending on how deal with zero pages in the cache
>> 
>>     multifd_recv_zero_page_process(p);
>> 
>>     if (!p->normal_num) {
>>         return 0;
>>     }
>> 
>>     for (int i = 0; i < p->normal_num; i++) {
>>         p->iov[i].iov_base = p->host + p->normal[i];
>>         p->iov[i].iov_len = multifd_ram_page_size();
>>         ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
>>     }
>> +    ret = qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> +    if (ret != 0) {
>> +        return ret;
>> +    }
>> +
>> +    if (migration_incoming_colo_enabled()) {
>> +        multifd_colo_process_recv();
>> +    }
>> 
>>     return ret;
>> }
>> 
>> 
>> > +    }
>> > +}
>> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
>> > new file mode 100644
>> > index 
>> > 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
>> > --- /dev/null
>> > +++ b/migration/multifd-colo.h
>> > @@ -0,0 +1,26 @@
>> > +/*
>> > + * SPDX-License-Identifier: GPL-2.0-or-later
>> > + *
>> > + * multifd colo header
>> > + *
>> > + * Copyright (c) Lukas Straub <[email protected]>
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
>> > later.
>> > + * See the COPYING file in the top-level directory.
>> > + */
>> > +
>> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
>> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
>> > +
>> > +#ifdef CONFIG_REPLICATION
>> > +
>> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
>> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
>> > +
>> > +#else
>> > +
>> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
>> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
>> > +
>> > +#endif
>> > +#endif
>> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
>> > index 
>> > 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0
>> >  100644
>> > --- a/migration/multifd-nocomp.c
>> > +++ b/migration/multifd-nocomp.c
>> > @@ -16,6 +16,7 @@
>> >  #include "file.h"
>> >  #include "migration-stats.h"
>> >  #include "multifd.h"
>> > +#include "multifd-colo.h"
>> >  #include "options.h"
>> >  #include "migration.h"
>> >  #include "qapi/error.h"
>> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, 
>> > Error **errp)
>> >          return -1;
>> >      }
>> >  
>> > -    p->host = p->block->host;
>> >      for (i = 0; i < p->normal_num; i++) {
>> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
>> >  
>> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, 
>> > Error **errp)
>> >          p->zero[i] = offset;
>> >      }
>> >  
>> > +    if (migrate_colo()) {
>> > +        multifd_colo_prepare_recv(p);
>> > +        assert(p->block->colo_cache);
>> > +        p->host = p->block->colo_cache;  
>> 
>> Can't you just use p->block->colo_cache later? I don't see why p->host
>> needs to be set beforehand even in the non-colo case.
>
> We should not touch the guest ram directly while in colo state, since
> the incoming guest is running and we either want to receive and apply a
> whole checkpoint with all ram into colo cache and all device state,
> or if anything goes wrong during checkpointing, keep the currently
> running guest on the incoming side in pristine state.
>

I was asking about setting p->host at this specific point. I don't think
any of this fits the unfill function. However, I see those were
suggested by Peter so let's not go back and forth.

> I have written more about colo migration here:
>
> https://lore.kernel.org/qemu-devel/20260117204913.584e1829@penguin/
> https://lore.kernel.org/qemu-devel/[email protected]/
>
>> 
>> > +    } else {
>> > +        p->host = p->block->host;
>> > +    }
>> > +
>> >      return 0;
>> >  }
>> >  
>> > diff --git a/migration/multifd.c b/migration/multifd.c
>> > index 
>> > 332e6fc58053462419f3171f6c320ac37648ef7b..220ed8564960fdabc58e4baa069dd252c8ad293c
>> >  100644
>> > --- a/migration/multifd.c
>> > +++ b/migration/multifd.c
>> > @@ -29,6 +29,7 @@
>> >  #include "qemu-file.h"
>> >  #include "trace.h"
>> >  #include "multifd.h"
>> > +#include "multifd-colo.h"
>> >  #include "options.h"
>> >  #include "qemu/yank.h"
>> >  #include "io/channel-file.h"
>> > @@ -1258,6 +1259,13 @@ static int multifd_ram_state_recv(MultiFDRecvParams 
>> > *p, Error **errp)
>> >      int ret;
>> >  
>> >      ret = multifd_recv_state->ops->recv(p, errp);
>> > +    if (ret != 0) {
>> > +        return ret;
>> > +    }
>> > +
>> > +    if (migrate_colo()) {
>> > +        multifd_colo_process_recv(p);
>> > +    }  
>> 
>> Either put all of colo hooks in multifd.c or multifd-nocomp.c. I think
>> the latter is more appropriate as we have mapped_ram already in
>> there. Let's drop patch 3 and put this in multifd_nocomp_recv().
>
> Again, it also should work with compression and multifd_nocomp_recv()
> is for nocomp only.
>
>> 
>> >  
>> >      return ret;
>> >  }
>> > diff --git a/migration/multifd.h b/migration/multifd.h
>> > index 
>> > 89a395aef2b09a6762c45b5361e0ab63256feff6..fbc35702b062fdc3213ce92baed35994f5967c2b
>> >  100644
>> > --- a/migration/multifd.h
>> > +++ b/migration/multifd.h
>> > @@ -279,7 +279,10 @@ typedef struct {
>> >      uint64_t packets_recved;
>> >      /* ramblock */
>> >      RAMBlock *block;
>> > -    /* ramblock host address */
>> > +    /*
>> > +     * Normally, it points to ramblock's host address.  When COLO
>> > +     * is enabled, it points to the mirror cache for the ramblock.
>> > +     */
>> >      uint8_t *host;
>> >      /* buffers to recv */
>> >      struct iovec *iov;  

Reply via email to