On Mon, Apr 29, 2019 at 08:21:06AM -0600, Alex Williamson wrote:
> On Sat, 27 Apr 2019 10:09:51 +0200
> Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
> > On 27/04/19 07:29, Paolo Bonzini wrote:
> > >   
> > >>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > >>> resample feature, but vfio never gets the unmask notification, so the
> > >>> device remains with DisINTx set and no further interrupts are
> > >>> generated.  Do we expect KVM's IRQFD with resampler to work in the
> > >>> split IRQ mode?  We can certainly hope that "high performance" devices
> > >>> use MSI or MSI/X, but this would be quite a performance regression with
> > >>> split mode if our userspace bypass for INTx goes away.  Thanks,  
> > >>
> > >> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> > >> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> > >> kvm_notify_acked_gsi(),  
> > > 
> > > That wouldn't help because kvm_ioapic_update_eoi would not even be
> > > able to access vcpu->kvm->arch.vioapic (it's NULL).
> > > 
> > > The following untested patch would signal the resamplefd in 
> > > kvm_ioapic_send_eoi,
> > > before requesting the exit to userspace.  However I am not sure how QEMU
> > > sets up the VFIO eventfds: if I understand correctly, when VFIO writes 
> > > again to
> > > the irq eventfd, the interrupt request would not reach the userspace 
> > > IOAPIC, but
> > > only the in-kernel LAPIC.  That would be incorrect, and if my 
> > > understanding is
> > > correct we need to trigger resampling from hw/intc/ioapic.c.  
> > 
> > Actually it's worse: because you're bypassing IOAPIC when raising the
> > irq, the IOAPIC's remote_irr for example will not be set.  So split
> > irqchip currently must disable the intx fast path completely.
> > 
> > I guess we could also reimplement irqfd and resamplefd in the userspace
> > IOAPIC, and run the listener in a separate thread (using "-object
> > iothread" on the command line and AioContext in the code).
> 
> This sounds like a performance regression vs KVM irqchip any way we
> slice it.  Was this change a mistake?  Without KVM support, the
> universal support in QEMU kicks in, where device mmaps are disabled
> when an INTx occurs, forcing trapped access to the device, and we
> assume that the next access is in response to an interrupt and trigger
> our own internal EOI and re-enable mmaps.  A timer acts as a
> catch-all.  Needless to say, this is functional but not fast.  It would
> be a massive performance regression for devices depending on INTx and
> previously using the KVM bypass to switch to this.  INTx is largely
> considered a legacy interrupt, so non-x86 archs don't encounter it as
> often, S390 even explicitly disables INTx support.  ARM and POWER
> likely just don't see a lot of these devices, but nearly all devices
> (except SR-IOV VFs) on x86 expect an INTx fallback mode and some
> drivers may run the device in INTx for compatibility.  This split
> irqchip change was likely fine for "enterprise" users concerned only
> with modern high speed devices, but very much not for device assignment
> used for compatibility use cases or commodity hardware users.
> 
> What's a good 4.0.1 strategy to resolve this?  Re-instate KVM irqchip
> as the Q35 default?  I can't see that simply switching to current QEMU
> handling is a viable option for performance?  What about 4.1?  We could
> certainly improve EOI support in QEMU, there's essentially no support
> currently, but it seems like an uphill battle for an iothread based
> userspace ioapic to ever compare to KVM handling?  Thanks,

irqchip=split and irqchip=kernel aren't guest ABI compatible, are
they?  That would make it impossible to fix this in pc-q35-4.0
for a 4.0.1 update.

-- 
Eduardo

Reply via email to