On Wed, Oct 19, 2011 at 01:17:03PM +0200, Jan Kiszka wrote: > On 2011-10-19 11:03, Michael S. Tsirkin wrote: > >>> I thought we need to match APIC ID. That needs a table lookup, no? > >> > >> Yes. But that's completely independent of a concrete MSI message. In > >> fact, this is the same thing we need when interpreting an IOAPIC > >> redirection table entry. So let's create an APIC ID lookup table for the > >> destination ID field, maybe multiple of them to match different modes, > >> but not a MSI message table. > >>> > >>>> Or are you thinking about some cluster mode? > >>> > >>> That too. > > > > Hmm, might be a good idea. APIC IDs are 8 bit, right? > > Yep (more generally: destination IDs). So even if we have to create > multiple lookup tables for the various modes, that won't consume > megabytes of RAM. > > > > > > >>>>> > >>>>> > >>>>>>> > >>>>>>> An analogy would be if read/write operated on file paths. > >>>>>>> fd makes it easy to do permission checks and slow lookups > >>>>>>> in one place. GSI happens to work like this (maybe, by accident). > >>>>>> > >>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it > >>>>>> encodes not only the routing handle but also other useful associated > >>>>>> information we need from time to time - internally, not in the device > >>>>>> models. > >>>>> > >>>>> Forget qemu abstractions, I am talking about data path > >>>>> optimizations in kernel in kvm. From that POV the point of an fd is not > >>>>> that it is opaque. It is that it's an index in an array that > >>>>> can be used for fast lookups. > >>>>> > >>>>>>>>> > >>>>>>>>> Another concern is mask bit emulation. We currently > >>>>>>>>> handle mask bit in userspace but patches > >>>>>>>>> to do them in kernel for assigned devices where seen > >>>>>>>>> and IMO we might want to do that for virtio as well. > >>>>>>>>> > >>>>>>>>> For that to work the mask bit needs to be tied to > >>>>>>>>> a specific gsi or specific device, which does not > >>>>>>>>> work if we just inject arbitrary writes. > >>>>>>>> > >>>>>>>> Yes, but I do not see those valuable plans being negatively affected. > >>>>>>>> > >>>>>>>> Jan > >>>>>>>> > >>>>>>> > >>>>>>> I do. > >>>>>>> How would we maintain a mask/pending bit in kernel if we are not > >>>>>>> supplied info on all available vectors even? > >>>>>> > >>>>>> It's tricky to discuss an undefined interface (there only exists an > >>>>>> outdated proposal for kvm device assignment). But I suppose that user > >>>>>> space will have to define the maximum number of vectors when creating > >>>>>> an > >>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to > >>>>>> msix_init. > >>>>>> > >>>>>> The number of used vectors will correlate with the number of registered > >>>>>> irqfds (in the case of vhost or vfio, device assignment still has > >>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask > >>>>>> processing, user space would keep vectors registered with irqfds, even > >>>>>> if they are masked. It could just continue to play the trick and drop > >>>>>> data=0 vectors. > >>>>> > >>>>> Which trick? We don't play any tricks except for device assignment. > >>>>> > >>>>>> The point here is: All those steps have _nothing_ to do with the > >>>>>> generic > >>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM > >>>>>> provides > >>>>>> an API. In contrast, msix_vector_use/unuse were generic services that > >>>>>> were actually created to please KVM requirements. But if we split that > >>>>>> up, we can address the generic MSI-X requirements in a way that makes > >>>>>> more sense for emulated devices (and particularly msix_vector_use makes > >>>>>> no sense for emulation). > >>>>>> > >>>>>> Jan > >>>>>> > >>>>> > >>>>> We need at least msix_vector_unuse > >>>> > >>>> Not at all. We rather need some qemu_irq_set(level) for MSI. > >>>> The spec > >>>> requires that the device clears pending when the reason for that is > >>>> removed. And any removal that is device model-originated should simply > >>>> be signaled like an irq de-assert. > >>> > >>> OK, this is a good argument. > >>> In particular virtio ISR read could clear msix pending bit > >>> (note: it would also need to clear irqfd as that is where > >>> we get the pending bit). > >>> > >>> I would prefer not to use qemu_irq_set for this though. > >>> We can add a level flag to msix_notify. > >> > >> No concerns. > >> > >>> > >>>> Vector "unusage" is just one reason here. > >>> > >>> I don't see removing the use/unuse functions as a priority though, > >>> but if we add an API that also lets devices say > >>> 'reason for interrupt is removed', that would be nice. > >>> > >>> Removing extra code can then be done separately, and on qemu.git > >>> not on qemu-kvm. > >> > >> If we refrain from hacking KVM logic into the use/unuse services > >> upstream, we can do this later on. For me it is important that those > >> obsolete services do not block or complicate further cleanups of the MSI > >> layer nor bother device model creators with tasks they should not worry > >> about. > > > > My assumption is devices shall keep calling use/unuse until we drop it. > > Does not seem like a major bother. If you like, use all vectors > > or just those with message != 0. > > What about letting only those devices call use/unuse that sometimes need > less than the maximum amount? All other would benefit for an use_all > executed on enable and a unuse_all on disable/reset/uninit.
Sure, I don't mind adding use_all/unuse_all wrappers. > > > >>> > >>>>> - IMO it makes more sense than "clear > >>>>> pending vector". msix_vector_use is good to keep around for symmetry: > >>>>> who knows whether we'll need to allocate resources per vector > >>>>> in the future. > >>>> > >>>> For MSI[-X], the spec is already there, and we know that there no need > >>>> for further resources when emulating it. > >>>> Only KVM has special needs. > >>>> > >>>> Jan > >>>> > >>> > >>> It's not hard to speculate. Imagine an out of process device that > >>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly > >>> we need an fd per vector, and this without KVM. > >> > >> That's what irqfd was invented for. Already works for vhost, and there > >> is nothing that prevents communicating the irqfd fd between two > >> processes. But note: irqfd handle, not a KVM-internal GSI. > >> > >> Jan > >> > > > > Yes. But this still makes an API for acquiring per-vector resources a > > requirement. > > Yes, but a different one than current use/unuse. What's wrong with use/unuse as an API? It's already in place and virtio calls it. > And it will be an > optional one, only for those devices that need to establish irq/eventfd > channels. > > Jan Not sure this should be up to the device. > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux