Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

Michael S. Tsirkin Thu, 20 Oct 2011 15:01:18 -0700

On Wed, Oct 19, 2011 at 01:17:03PM +0200, Jan Kiszka wrote:
> On 2011-10-19 11:03, Michael S. Tsirkin wrote:
> >>> I thought we need to match APIC ID. That needs a table lookup, no?
> >>
> >> Yes. But that's completely independent of a concrete MSI message. In
> >> fact, this is the same thing we need when interpreting an IOAPIC
> >> redirection table entry. So let's create an APIC ID lookup table for the
> >> destination ID field, maybe multiple of them to match different modes,
> >> but not a MSI message table.
> >>>
> >>>> Or are you thinking about some cluster mode?
> >>>
> >>> That too.
> > 
> > Hmm, might be a good idea. APIC IDs are 8 bit, right?
> 
> Yep (more generally: destination IDs). So even if we have to create
> multiple lookup tables for the various modes, that won't consume
> megabytes of RAM.
> 
> > 
> > 
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> An analogy would be if read/write operated on file paths.
> >>>>>>> fd makes it easy to do permission checks and slow lookups
> >>>>>>> in one place. GSI happens to work like this (maybe, by accident).
> >>>>>>
> >>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
> >>>>>> encodes not only the routing handle but also other useful associated
> >>>>>> information we need from time to time - internally, not in the device
> >>>>>> models.
> >>>>>
> >>>>> Forget qemu abstractions, I am talking about data path
> >>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
> >>>>> that it is opaque. It is that it's an index in an array that
> >>>>> can be used for fast lookups.
> >>>>>
> >>>>>>>>>
> >>>>>>>>> Another concern is mask bit emulation. We currently
> >>>>>>>>> handle mask bit in userspace but patches
> >>>>>>>>> to do them in kernel for assigned devices where seen
> >>>>>>>>> and IMO we might want to do that for virtio as well.
> >>>>>>>>>
> >>>>>>>>> For that to work the mask bit needs to be tied to
> >>>>>>>>> a specific gsi or specific device, which does not
> >>>>>>>>> work if we just inject arbitrary writes.
> >>>>>>>>
> >>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
> >>>>>>>>
> >>>>>>>> Jan
> >>>>>>>>
> >>>>>>>
> >>>>>>> I do.
> >>>>>>> How would we maintain a mask/pending bit in kernel if we are not
> >>>>>>> supplied info on all available vectors even?
> >>>>>>
> >>>>>> It's tricky to discuss an undefined interface (there only exists an
> >>>>>> outdated proposal for kvm device assignment). But I suppose that user
> >>>>>> space will have to define the maximum number of vectors when creating 
> >>>>>> an
> >>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to 
> >>>>>> msix_init.
> >>>>>>
> >>>>>> The number of used vectors will correlate with the number of registered
> >>>>>> irqfds (in the case of vhost or vfio, device assignment still has
> >>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
> >>>>>> processing, user space would keep vectors registered with irqfds, even
> >>>>>> if they are masked. It could just continue to play the trick and drop
> >>>>>> data=0 vectors.
> >>>>>
> >>>>> Which trick?  We don't play any tricks except for device assignment.
> >>>>>
> >>>>>> The point here is: All those steps have _nothing_ to do with the 
> >>>>>> generic
> >>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM 
> >>>>>> provides
> >>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
> >>>>>> were actually created to please KVM requirements. But if we split that
> >>>>>> up, we can address the generic MSI-X requirements in a way that makes
> >>>>>> more sense for emulated devices (and particularly msix_vector_use makes
> >>>>>> no sense for emulation).
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>
> >>>>> We need at least msix_vector_unuse
> >>>>
> >>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
> >>>> The spec
> >>>> requires that the device clears pending when the reason for that is
> >>>> removed. And any removal that is device model-originated should simply
> >>>> be signaled like an irq de-assert.
> >>>
> >>> OK, this is a good argument.
> >>> In particular virtio ISR read could clear msix pending bit
> >>> (note: it would also need to clear irqfd as that is where
> >>>  we get the pending bit).
> >>>
> >>> I would prefer not to use qemu_irq_set for this though.
> >>> We can add a level flag to msix_notify.
> >>
> >> No concerns.
> >>
> >>>
> >>>> Vector "unusage" is just one reason here.
> >>>
> >>> I don't see removing the use/unuse functions as a priority though,
> >>> but if we add an API that also lets devices say
> >>> 'reason for interrupt is removed', that would be nice.
> >>>
> >>> Removing extra code can then be done separately, and on qemu.git
> >>> not on qemu-kvm.
> >>
> >> If we refrain from hacking KVM logic into the use/unuse services
> >> upstream, we can do this later on. For me it is important that those
> >> obsolete services do not block or complicate further cleanups of the MSI
> >> layer nor bother device model creators with tasks they should not worry
> >> about.
> > 
> > My assumption is devices shall keep calling use/unuse until we drop it.
> > Does not seem like a major bother. If you like, use all vectors
> > or just those with message != 0.
> 
> What about letting only those devices call use/unuse that sometimes need
> less than the maximum amount? All other would benefit for an use_all
> executed on enable and a unuse_all on disable/reset/uninit.


Sure, I don't mind adding use_all/unuse_all wrappers.

> > 
> >>>
> >>>>> - IMO it makes more sense than "clear
> >>>>> pending vector". msix_vector_use is good to keep around for symmetry:
> >>>>> who knows whether we'll need to allocate resources per vector
> >>>>> in the future.
> >>>>
> >>>> For MSI[-X], the spec is already there, and we know that there no need
> >>>> for further resources when emulating it.
> >>>> Only KVM has special needs.
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>> It's not hard to speculate.  Imagine an out of process device that
> >>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
> >>> we need an fd per vector, and this without KVM.
> >>
> >> That's what irqfd was invented for. Already works for vhost, and there
> >> is nothing that prevents communicating the irqfd fd between two
> >> processes. But note: irqfd handle, not a KVM-internal GSI.
> >>
> >> Jan
> >>
> > 
> > Yes. But this still makes an API for acquiring per-vector resources a 
> > requirement.
> 
> Yes, but a different one than current use/unuse.

What's wrong with use/unuse as an API? It's already in place
and virtio calls it.

> And it will be an
> optional one, only for those devices that need to establish irq/eventfd
> channels.
> 
> Jan

Not sure this should be up to the device.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

Reply via email to