* Yury Kotov (yury-ko...@yandex-team.ru) wrote:
> Hi,
> 
> I found that it's possible to remove a RAMBlock during migration.
> E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
> And I want to clarify whether RAMBlock removing (or even adding) during
> migration is valid operation or it's a bug.
> 
> Currently, it may cause some race conditions with migration thread and
> migration may fail because of them. For instance, vmstate_unregister_ram
> function which is called during PCIe device removing does these:
> - Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
> - Set RAMBlock flags as non-migratable -> migration fail
> 
> RAMBlock removing itself seems safe for migration thread because of RCU.
> But it seems to me there are other possible race conditions (didn't test it):
> - qemu_put_buffer_async -> saves pointer to RAMBlock's memory
>    -> block will be freed out of RCU (between ram save iterations)
>    -> qemu_fflush -> access to freed memory.
> 
> So, I have the following questions:
> 1. Is RAMBlock removing/adding OK during migration?

I don't think that any hot(un)plug is safe during migration.
While it's true we hold RCUs as we walk lists, we can't hold the RCU
around the entire migration.

There's lots of other problems;  for example we call the .save_setup
methods on devices at the start of migration, but then call the iterate
on those devices later - if the device is added/removed between stages
we'll end up either having done a setup and not calling the actual save,
or the other way around.

Juan added checks to qdev_device_add/qdev_unplug in b06424d ~2.5 years
ago.

> 2. If yes then what should we do with vmstate_unregister_ram?
>    - Just remove vmstate_unregister_ram (my RFC patch)
>    - Refcount RAMBlock's migratable/non-migratable state
>    - Something else?
> 3. If it mustn't be possible, so may be
>    assert(migration_is_idle()) in qemu_ram_free?
> 
> P.S.
> I'm working on a fix of below problem and trying to choose better way:
> allow device removing and fix all problem like this or fix a particular 
> device.
> 
> --------
> How to reproduce device removing during migration:
> 
> 1. Source QEMU command line (target is similar)
>   $ x86_64-softmmu/qemu-system-x86_64 \
>     -nodefaults -no-user-config -m 1024 -M q35 \
>     -qmp unix:./src.sock,server,nowait \
>     -drive file=./image,format=raw,if=virtio \
>     -device ioh3420,id=pcie.1 \
>     -device virtio-net,bus=pcie.1
> 2. Start migration with slow speed (to simplify reproducing)
> 3. Power off a device on the hotplug pcie.1 bus:
>   $ echo 0 > /sys/bus/pci/slots/0/power
> 4. Increase migration speed and wait until fail
> 
> Most likely you will get something like this:
>   qemu-system-x86_64: get_pci_config_device: Bad config data:
>           i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
>   qemu-system-x86_64: Failed to load PCIDevice:config
>   qemu-system-x86_64: Failed to load
>           ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
>   qemu-system-x86_64: error while loading state for instance 0x0 of device
>           '0000:00:03.0/ioh-3240-express-root-port'
>   qemu-system-x86_64: load of migration failed: Invalid argument
> 
> This error is just an illustration of the removing device possibility,
> but not actually an illustration of the race conditions for removing RAMBlock.

What path does this actually take - does it end up going via qdev_unplug
or some other way?

Dave

> Regards,
> Yury
> 
> Yury Kotov (1):
>   migration: Remove vmstate_unregister_ram
> 
>  hw/block/pflash_cfi01.c     | 1 -
>  hw/block/pflash_cfi02.c     | 1 -
>  hw/mem/pc-dimm.c            | 5 -----
>  hw/misc/ivshmem.c           | 2 --
>  hw/pci/pci.c                | 1 -
>  include/migration/vmstate.h | 1 -
>  migration/savevm.c          | 6 ------
>  7 files changed, 17 deletions(-)
> 
> -- 
> 2.24.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


Reply via email to