Hi Yu, On Mon, Apr 3, 2023 at 6:59 PM Yu Zhang <yu.zh...@ionos.com> wrote: > > Dear Laurent, > > Thank you for your quick reply. We used qemu-7.1, but it is reproducible with > qemu from v6.2 to the recent v8.0 release candidates. > I found that it's introduced by the commit 9323f892b39 (between v6.2.0-rc2 > and v6.2.0-rc3). > > If it doesn't break anything else, it suffices to remove the line below from > acpi_pcihp_device_unplug_request_cb(): > > pdev->qdev.pending_deleted_event = true; > > but you may have a reason to keep it. First of all, I'll open a bug in the > bug tracker and let you know. > > Best regards, > Yu Zhang This patch from Igor Mammedov seems relevant, https://lore.kernel.org/qemu-devel/20230403131833-mutt-send-email-...@kernel.org/T/#t Can you try it out?
Regards! Jinpu > > On Mon, Apr 3, 2023 at 6:32 PM Laurent Vivier <lviv...@redhat.com> wrote: >> >> Hi Yu, >> >> please open a bug in the bug tracker: >> >> https://gitlab.com/qemu/qemu/-/issues >> >> It's easier to track the problem. >> >> What is the version of QEMU you are using? >> Could you provide QEMU command line? >> >> Thanks, >> Laurent >> >> >> On 4/3/23 15:24, Yu Zhang wrote: >> > Dear Laurent, >> > >> > recently we run into an issue with the following error: >> > >> > command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" } >> > }' for VM "id" >> > failed ({ "return": {"class": "GenericError", "desc": "Device virtio-diskX >> > is already in >> > the process of unplug"} }). >> > >> > The issue is reproducible. With a few seconds delay before hot-unplug, >> > hot-unplug just >> > works fine. >> > >> > After a few digging, we found that the commit 9323f892b39 may incur the >> > issue. >> > ------------------ >> > failover: fix unplug pending detection >> > >> > Failover needs to detect the end of the PCI unplug to start migration >> > after the VFIO card has been unplugged. >> > >> > To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and >> > reset in >> > pcie_unplug_device(). >> > >> > But since >> > 17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on >> > Q35") >> > we have switched to ACPI unplug and these functions are not called >> > anymore >> > and the flag not set. So failover migration is not able to detect if >> > card >> > is really unplugged and acts as it's done as soon as it's started. So >> > it >> > doesn't wait the end of the unplug to start the migration. We don't >> > see any >> > problem when we test that because ACPI unplug is faster than PCIe >> > native >> > hotplug and when the migration really starts the unplug operation is >> > already done. >> > >> > See c000a9bd06ea ("pci: mark device having guest unplug request >> > pending") >> > a99c4da9fc2a ("pci: mark devices partially unplugged") >> > >> > Signed-off-by: Laurent Vivier <lviv...@redhat.com >> > <mailto:lviv...@redhat.com>> >> > Reviewed-by: Ani Sinha <a...@anisinha.ca <mailto:a...@anisinha.ca>> >> > Message-Id: <20211118133225.324937-4-lviv...@redhat.com >> > <mailto:20211118133225.324937-4-lviv...@redhat.com>> >> > Reviewed-by: Michael S. Tsirkin <m...@redhat.com >> > <mailto:m...@redhat.com>> >> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com >> > <mailto:m...@redhat.com>> >> > ------------------ >> > The purpose is for detecting the end of the PCI device hot-unplug. >> > However, we feel the >> > error confusing. How is it possible that a disk "is already in the process >> > of unplug" >> > during the first hot-unplug attempt? So far as I know, the issue was also >> > encountered by >> > libvirt, but they simply ignored it: >> > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1878659 >> > <https://bugzilla.redhat.com/show_bug.cgi?id=1878659> >> > >> > Hence, a question is: should we have the line below in >> > acpi_pcihp_device_unplug_request_cb()? >> > >> > pdev->qdev.pending_deleted_event = true; >> > >> > It would be great if you as the author could give us a few hints. >> > >> > Thank you very much for your reply! >> > >> > Sincerely, >> > >> > Yu Zhang @ Compute Platform IONOS >> > 03.04.2013 >>