On Tue, Sep 26, 2023 at 06:01:02PM +0800, Li Zhijian wrote:
> Migration over RDMA failed since
> commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
> with erors:
> qemu-system-x86_64: rdma: Too many requests in this message 
> (3638950032).Bailing.
> 
> migration with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
> even though multifd is disabled.
> 
> This change make migrate_multifd_flush_after_each_section() return true
> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
> will not be sent to destination any more when multifd is disabled.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> CC: Fabiano Rosas <faro...@suse.de>
> Signed-off-by: Li Zhijian <lizhij...@fujitsu.com>
> ---
> 
> V2: put that check at the entry of migrate_multifd_flush_after_each_section() 
> # Peter

When seeing this I notice my suggestion wasn't ideal either, as we rely on
both multifd_send_sync_main() and multifd_recv_sync_main() be no-op when
!multifd.

For the long term, we should not call multifd functions at all, if multifd
is not enabled..

Reviewed-by: Peter Xu <pet...@redhat.com>

-- 
Peter Xu


Reply via email to