On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: > > > On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: > > On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: > >> If Windows 10 guests have enabled 'turn off hard disk after idle' > >> option in power settings, and the guest has a SATA disk plugged in, > >> the SATA disk will be turned off after a specified idle time. > >> If the guest is live migrated or saved/loaded with its SATA disk > >> turned off, the following error will occur: > >> > >> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS > >> receive buffer address > >> qemu-system-x86_64: Failed to load ich9_ahci:ahci > >> qemu-system-x86_64: error while loading state for instance 0x0 of device > >> '0000:00:1a.0/ich9_ahci' > >> qemu-system-x86_64: load of migration failed: Operation not permitted > >> > >> Observation from trace logs shows that a while after Windows 10 turns off > >> a SATA disk (IDE disks don't have the following behavior), > >> it will disable the PCI_COMMAND_MASTER flag of the pci device containing > >> the ahci device. When the the disk is turning back on, > >> the PCI_COMMAND_MASTER flag will be restored first. > >> But if the guest is migrated or saved/loaded while the disk is off, > >> the post_load callback of ahci device, ahci_state_post_load(), will fail > >> at ahci_cond_start_engines() if the MemoryRegion > >> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > >> to the PCIDevice struct containing the ahci device. > >> > >> This patch enables pci_dev->bus_master_enable_region before calling > >> ahci_cond_start_engines() in ahci_state_post_load(), and restore the > >> MemoryRegion to its original state afterwards. > >> > >> Signed-off-by: andychiu <andyc...@synology.com> > > > > Poking at PCI device internals like this seems fragile. And force > > enabling bus master can lead to unpleasantness like corrupting guest > > memory, unhandled interrupts, etc. E.g. it's quite reasonable, > > spec-wise, for the guest to move thing in memory around while bus > > mastering is off. > > > > Can you teach ahci that region being disabled > > during migration is ok, and recover from it? > > That's what I'm wondering. > > I could try to just disable the FIS RX engine if the mapping fails, but > that will require a change to guest visible state. > > My hunch, though, is that when windows re-enables the device it will > need to re-program the address registers anyway, so it might cope well > with the FIS RX bit getting switched off. > > (I'm wondering if it isn't a mistake that QEMU is trying to re-map this > address in the first place. Is it legal that the PCI device has pci bus > master disabled but we've held on to a mapping?
If you are poking at guest memory when bus master is off, then most likely yes. > Should there be some > callback where AHCI knows to invalidate mappings at that point...?) ATM the callback is the config write, you check proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER and if disabled invalidate the mapping. virtio at least has code that pokes at proxy->pci_dev.config[PCI_COMMAND] too, I'm quite open to a function along the lines of pci_is_bus_master_enabled() that will do this. -- MST