On 11/29/2017 12:47 PM, Cornelia Huck wrote: > On Wed, 29 Nov 2017 16:17:35 +0800 > Dong Jia Shi <bjsdj...@linux.vnet.ibm.com> wrote: > >> * Halil Pasic <pa...@linux.vnet.ibm.com> [2017-11-28 14:07:58 +0100]: >> >> [...] >>> The auto-generated bus ids are affected by both changes. We hope to not >>> encounter any auto-generated bus ids in production as Libvirt is always >>> explicit about the bus id. Since 8ed179c937 ("s390x/css: catch section >>> mismatch on load", 2017-05-18) the worst that can happen because the same >>> device ended up having a different bus id is a cleanly failed migration. >>> I find it hard to reason about the impact of changed auto-generated bus >>> ids on migration for command line users as I don't know which rules is >>> such an user supposed to follow. >> For this paragraph, Halil pointed to me a case that he is thinking of. >> 1. VM configuration with 3 devices: >> -device virtio (e.g. virtio-blk-ccw,id=disk0) >> -device vfio-ccw (e.g. id=vfio0) >> -device virtio (e.g. virtio-rng-ccw,id=rng0) >> 2. Start the vm. >> 3. device_del vfio0 >> 4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz" >> 5. modify cmd line from step 1 by removing the vfio0 device, and adding: >> -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz" >> >> Let me list my test results here for everybody's reference. >> >> W/o this patch >> ============== >> >> ------------+---------------+------------- >> | squashing off | squashing on >> ------------+---------------+------------- >> auto id | F | F >> ------------+---------------+------------- >> explicit id | F | S >> ------------+---------------+------------- >> >> T1. squashing off + auto id >> qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER >> qemu-system-s390x: Failed to load s390_css:css >> qemu-system-s390x: error while loading state for instance 0x0 of device >> 's390_css' >> qemu-system-s390x: load of migration failed: Invalid argument >> [Fail due to css mismatch - there is no css 0 in the new vm.] >> >> T2. squashing off + explicit given id >> qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER >> qemu-system-s390x: Failed to load s390_css:css >> qemu-system-s390x: error while loading state for instance 0x0 of device >> 's390_css' >> qemu-system-s390x: load of migration failed: Invalid argument >> [Fail due to css mismatch - there is no css 0 in the new vm.] > Hmm... so should we even try to migrate an empty css 0? It only exists > because we have created a device that we had to detach anyway because > it was non-migrateable... > > [Probably no easy way to deal with this, though.] >
We could make the thing go away when the last device is gone. I see a general problem with implicitly generated shared stuff. Obviously we can't fix the past. @Dong Jia: Thanks for doing the experiments and publishing your findings. Halil