On Wed, Dec 20, 2023 at 6:22 AM Jason Wang <jasow...@redhat.com> wrote: > > On Sat, Dec 16, 2023 at 1:28 AM Eugenio Pérez <epere...@redhat.com> wrote: > > > > Callers can use this function to setup the incoming migration thread. > > > > This thread is able to map the guest memory while the migration is > > ongoing, without blocking QMP or other important tasks. While this > > allows the destination QEMU not to block, it expands the mapping time > > during migration instead of making it pre-migration. > > If it's just QMP, can we simply use bh with a quota here? >
Because QEMU cannot guarantee the quota at write(fd, VHOST_IOTLB_UPDATE, ...). Also, synchronization with vhost_vdpa_dev_start would complicate as it would need to be re-scheduled too. As a half-baked idea, we can split the mapping chunks in manageable sizes, but I don't like that idea a lot. > Btw, have you measured the hotspot that causes such slowness? Is it > pinning or vendor specific mapping that slows down the progress? Or if > VFIO has a similar issue? > Si-Wei did the actual profiling as he is the one with the 128G guests, but most of the time was spent in the memory pinning. Si-Wei, please correct me if I'm wrong. I didn't check VFIO, but I think it just maps at realize phase with vfio_realize -> vfio_attach_device -> vfio_connect_container(). In previous testings, this delayed the VM initialization by a lot, as we're moving that 20s of blocking to every VM start. Investigating a way to do it only in the case of being the destination of a live migration, I think the right place is .load_setup migration handler. But I'm ok to move it for sure. > > > > This thread joins at vdpa backend device start, so it could happen that > > the guest memory is so large that we still have guest memory to map > > before this time. > > So we would still hit the QMP stall in this case? > This paragraph is kind of outdated, sorry. I can only cause this if I don't enable switchover_ack migration capability and if I artificially make memory pinning in the kernel artificially slow. But I didn't check QMP to be honest, so I can try to test it, yes. If QMP is not responsive, that means QMP is not responsive in QEMU master in that period actually. So we're only improving anyway. Thanks! > > This can be improved in later iterations, when the > > destination device can inform QEMU that it is not ready to complete the > > migration. > > > > If the device is not started, the clean of the mapped memory is done at > > .load_cleanup. This is far from ideal, as the destination machine has > > mapped all the guest ram for nothing, and now it needs to unmap it. > > However, we don't have information about the state of the device so its > > the best we can do. Once iterative migration is supported, this will be > > improved as we know the virtio state of the device. > > > > If the VM migrates before finishing all the maps, the source will stop > > but the destination is still not ready to continue, and it will wait > > until all guest RAM is mapped. It is still an improvement over doing > > all the map when the migration finish, but next patches use the > > switchover_ack method to prevent source to stop until all the memory is > > mapped at the destination. > > > > The memory unmapping if the device is not started is weird > > too, as ideally nothing would be mapped. This can be fixed when we > > migrate the device state iteratively, and we know for sure if the device > > is started or not. At this moment we don't have such information so > > there is no better alternative. > > > > Signed-off-by: Eugenio Pérez <epere...@redhat.com> > > > > --- > > Thanks >