On Wed, 15 May 2019 14:15:03 +0800 Peter Xu <pet...@redhat.com> wrote:
> On Tue, May 14, 2019 at 02:14:41PM -0600, Alex Williamson wrote: > > Commit b2fc91db8447 ("q35: set split kernel irqchip as default") changed > > the default for the pc-q35-4.0 machine type to use split irqchip, which > > turned out to have disasterous effects on vfio-pci INTx support. KVM > > resampling irqfds are registered for handling these interrupts, but > > these are non-functional in split irqchip mode. We can't simply test > > for split irqchip in QEMU as userspace handling of this interrupt is a > > significant performance regression versus KVM handling (GeForce GPUs > > assigned to Windows VMs are non-functional without forcing MSI mode or > > re-enabling kernel irqchip). > > > > The resolution is to revert the change in default irqchip mode in the > > pc-q35-4.1 machine and create a pc-q35-4.0.1 machine for the 4.0-stable > > branch. The qemu-q35-4.0 machine type should not be used in vfio-pci > > configurations for devices requiring legacy INTx support without > > explicitly modifying the VM configuration to use kernel irqchip. > > > > Link: https://bugs.launchpad.net/qemu/+bug/1826422 > > Fixes: b2fc91db8447 ("q35: set split kernel irqchip as default") > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > > Hi, Alex, > > I have two (probably naive) questions about the patch, possibly due to > lack of context of previous discussions so please let me know if > there's any upstream discussion that I can read. > > Firstly, could I ask why we need this 4.0.1 machine type specific for > fixing this problem? Asked because this seems to be the first time > QEMU introduces the X.Y.Z machine type in master. Could it be somehow > delayed to the release of QEMU 4.1? From the planning page I see that > it's releasing on Aug 06th/13th, a bit far away but not really that > much imho. I'm perfectly fine with this, but I just want to make sure > I have the correct understanding of the motivations. As I see it, this is a regression from previous releases, therefore it should be fixed in 4.0-stable. Users are encountering this issue and leaning on support groups like reddit.com/r/VFIO to find workarounds. It would be a disservice to our user base and downstream consumers to simply ignore this regression until the 4.1 release. If this is the first z-stream release of upstream QEMU with a new machine type, we've been lucky, but previous discussions indicate that we cannot currently change the irqchip mode without rev'ing the machine type for migration compatibility. > The second question is about our previous decision to introduce QEMU > 4.1 machine type before it's released (which is not related to the > patch at all). Is it really correct to do so before releasing of 4.1? > So now even with a development QEMU 4.0 branch the user will be able > to create 4.1 machines using "-M pc-q35-4.1", then what if the user > migrated a real 4.1 machine (with the to-be-released QEMU 4.1 binary) > to some 4.1 machine that was run with such an old 4.0 QEMU binary? > The problem is we can add more compatible properties into > pc_q35_4_1_machine_options and future pc_compat_4_1 array before QEMU > 4.1 is finally released and then "-M pc-q35-4.1" will actually have > different combination of properties IMHO, which seems to break > compatibility. Am I wrong somewhere? Users who expect migration stability from VMs based on unreleased development code are in for a world of hurt. I assume that the 4.1 machine types are entirely unstable until 4.1 is released. We introduce them early in the development cycle because we've been burned in the past introducing them late and inconsistently. Ideally this change would trigger a migration regression test to generate a warning for the in-development machine type changing in an incompatible way, we'd acknowledge that, perhaps log it to a changelog, and move on, but I suspect we don't have such automated testing in place. Thanks, Alex