Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
On Tue, 19 May 2009 12:10:13 pm David Miller wrote: From: Rusty Russell ru...@rustcorp.com.au Date: Mon, 18 May 2009 22:18:47 +0930 We check for finished xmit skbs on every xmit, or on a timer (unless the host promises to force an interrupt when the xmit ring is empty). This can penalize userspace tasks which fill their sockbuf. Not much difference with TSO, but measurable with large numbers of packets. There are a finite number of packets which can be in the transmission queue. We could fire the timer more than every 100ms, but that would just hurt performance for a corner case. This seems neatest. ... Signed-off-by: Rusty Russell ru...@rustcorp.com.au If this is so great for virtio it would also be a great idea universally, but we don't do it. What you're doing by orphan'ing is creating a situation where a single UDP socket can loop doing sends and monopolize the TX queue of a device. The only control we have over a sender for fairness in datagram protocols is that send buffer allocation. Urgh, that hadn't even occurred to me. Good point. I'm guilty of doing this too in the NIU driver, also because there I lack a TX queue empty interrupt and this can keep TCP sockets from getting stuck. I think we need a generic solution to this issue because it is getting quite common to see cases where the packets in the TX queue of a device can sit there indefinitely. I haven't thought this through properly, but how about a hack where we don't orphan packets if the ring is over half full? Then I guess we could overload the watchdog as a more general timer-after-no- xmit? Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 21 May 2009 16:27:05 +0930 On Tue, 19 May 2009 12:10:13 pm David Miller wrote: What you're doing by orphan'ing is creating a situation where a single UDP socket can loop doing sends and monopolize the TX queue of a device. The only control we have over a sender for fairness in datagram protocols is that send buffer allocation. Urgh, that hadn't even occurred to me. Good point. Now this all is predicated on this actually mattering. :-) You could argue that the scheduler as well as the size of the TX queue should be limiting and enforcing fairness. Someone really needs to test this. Just skb_orphan() every packet at the beginning of dev_hard_start_xmit(), then run some test program with two clients looping out UDP packets to see if one can monopolize the device and get a significantly larger amount of TX resources than the other. Repeat for 3, 4, 5, etc. clients. I haven't thought this through properly, but how about a hack where we don't orphan packets if the ring is over half full? That would also work. And for the NIU case this would be great because I DO have a marker bit for triggering interrupts in the TX descriptors. There's just no all empty interrupt on TX (who designs these things? :( ). Then I guess we could overload the watchdog as a more general timer-after-no- xmit? Yes, but it means that teardown of a socket can be delayed up to the amount of that timer. Factor in all of this crazy round_jiffies() stuff people do these days and it could cause pauses for real use cases and drive users batty. Probably the most profitable avenue is to see if this is a real issue afterall (see above). If we can get away with having the socket buffer represent socket -- device space only, that's the most ideal solution. It will probably also improve performance a lot across the board, especially on NUMA/SMP boxes as our TX complete events tend to be in difference places than the SKB producer. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote: On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote: On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote: On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote: On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote: On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote: On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote: On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote: On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote: On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote: On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote: define api for allocating/setting up msi-x irqs, and for updating them with msi-x vector information, supply implementation in ioapic. Please comment on this API: I intend to port my msi-x patch to work on top of it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Sparc64 also uses packets (mondos, not implemented yet) for interrupt vector data, there the packet size is 8 * 64 bits. I think we should aim for a more generic API that covers this case also. Are you sure this is a good idea? MSI is tied to PCI, and PCI only has MSI, not mondos. What code would benefit from this abstraction? Sparc64 emulation, of course. I think also the API would be neater. Since mondos are not interrupts, why use irqs for them? I just said above that they are used for interrupt vector data. What makes you think they are not interrupts? I'm sorry, I don't really know anything about sparc. All I am saying is that in PCI, interrupts never pass data, so qemu_set_irq as it is now, is a good API to send them. For the sparc feature you describe, you probably want to add a message data parameter to qemu_set_irq, but it's not really useful for MSI. Just to clarify, the main difference is that with MSI/MSI-X both data and address fields are mostly static, modifying them involves ioapic and device updates which might be an expensive operation (e.g. with kvm, needs an extra system call). So I don't think it makes sense to pass MSI-X data field with each call to qemu_set_irq. No, but I think the Sparc situation is the same, the packet data is static for the interrupt source in question. So, ok, we could add data update callback and then MSI and sparc would do their thing there. I'm not convinced I like all this play with untyped buffers, do you think it's helpful? If yes, maybe I'll try to code it up and see how does it look. Well, get/set_data could be limited to irq.c, ioapic.c could export something like get/set_msi_data. MSI callers should only use get/set_msi_data. Ditto for get/set_mondo_data. I started coding it up and got lost in a maze of callbacks and void pointers :) And I also remembered that besides data, there's a mask and pending bits in msi (it's ignored in the patch I sent, but I'm fixing that), so there needs to be an interface for this as well. I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I added, you'll find it's easy to implement mondos in very few lines of code. We can always add another level of indirection later when sparc code surfaces, if we find it helps. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: On Wednesday 20 May 2009, Michael S. Tsirkin wrote: define api for allocating/setting up msi-x irqs, and for updating them with msi-x vector information, supply implementation in ioapic. Please comment on this API: I intend to port my msi-x patch to work on top of it. I though the point of MSI is that they are just a regular memory writes, and don't require any special bus support. The PCI bus doesn't need any special support (I think) but something on the other end needs to interpret those writes. In any case we need some internal API for this, and qemu_irq looks like a good choice. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote: The PCI bus doesn't need any special support (I think) but something on the other end needs to interpret those writes. Sure. But there's definitely nothing PCI specific about it. I assumed this would all be contained within the APIC. Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts. For kvm, we'll have another implementation that passes these requests on to kernel. In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? Paul emulated PCI devices such as virtio. Hope to send a patch shortly. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: The PCI bus doesn't need any special support (I think) but something on the other end needs to interpret those writes. Sure. But there's definitely nothing PCI specific about it. I assumed this would all be contained within the APIC. MSIs are defined by PCI and their configuration is done using the PCI configuration space. In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). It probably makes sense to have common infrastructure in pci.c to expose/implement device side MSI functionality. However I see no need for a direct API between the device and the APIC. We already have an API for memory accesses and MMIO regions. I'm pretty sure a system could implement MSI by pointing the device at system ram, and having the CPU periodically poll that. Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote: Paul Brook wrote: In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). It probably makes sense to have common infrastructure in pci.c to expose/implement device side MSI functionality. However I see no need for a direct API between the device and the APIC. We already have an API for memory accesses and MMIO regions. I'm pretty sure a system could implement MSI by pointing the device at system ram, and having the CPU periodically poll that. Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. Right. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. I'm still not convinced. A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. I'm not suggesting the qemu_irq will extend all the way to the apic. Think of it as connecting the device core with its interrupt unit. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. OTOH stl_phys is exactly the right interface. The qemu_irq callback should do an stl_phys(). The device is happy since it's using the same API it uses for non-MSI. The APIC is happy since it isn't connected directly to the device. stl_phys() is happy since it sees more traffic and can serve more ads. kvm is happy since it can hijack the callback to throw the interrupt directly into the kernel. The KVM interface should be contained within the APIC implementation. Tricky, but doable. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote: On Thursday 21 May 2009, Avi Kivity wrote: Paul Brook wrote: In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). It probably makes sense to have common infrastructure in pci.c to expose/implement device side MSI functionality. However I see no need for a direct API between the device and the APIC. We already have an API for memory accesses and MMIO regions. I'm pretty sure a system could implement MSI by pointing the device at system ram, and having the CPU periodically poll that. Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. I'm still not convinced. A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. Actually, MSI messages are handled by IOMMU as interrupts, not as regular memory accesses. iommu book has comments such as • Interrupt addresses are never translated to memory addresses, but other special address ranges may be reclaimed to be backed with memory. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. Yes, I just chose to ignore the level value. It does not look like such a big issue ... Do you advocate a new qemu_msi structure then? OTOH stl_phys is exactly the right interface. Not really. These are writes but not into physical memory. For example, on intel 32 bit, stl_phys gets a 32 bit address, MSI writes encode the interrupt vector in high 32 bit bit of the address - way outside actual physical memory. The KVM interface should be contained within the APIC implementation. Paul Unfortunately kvm capabilities that are present in current kernels do not map well to this interface. You need to perform expensive set up for each interrupt vector you are going to use, be it MSI or regular interrupt. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote: Paul Brook wrote: Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. I'm still not convinced. A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. I'm not suggesting the qemu_irq will extend all the way to the apic. Think of it as connecting the device core with its interrupt unit. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. OTOH stl_phys is exactly the right interface. The qemu_irq callback should do an stl_phys(). The device is happy since it's using the same API it uses for non-MSI. The APIC is happy since it isn't connected directly to the device. stl_phys() is happy since it sees more traffic and can serve more ads. kvm is happy since it can hijack the callback to throw the interrupt directly into the kernel. I like the monetization angle here. The KVM interface should be contained within the APIC implementation. Tricky, but doable. I'd rather keep it simple. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote: MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, Works for me. which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. kvm has a fast path for irq injection. If qemu wants to support it we need some abstraction here. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote: which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. Yes it has :) kvm needs data on MSI entries: that's the interface current kernel exposes for injecting these interrupts. I think we also need to support in-kernel devices which would inject MSI interrupt directly from kernel. For these, kvm would need to know when mask bit changes and give us info on pending bit. That's a fair amount of PCI specific code in kvm. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Wednesday 20 May 2009, Michael S. Tsirkin wrote: define api for allocating/setting up msi-x irqs, and for updating them with msi-x vector information, supply implementation in ioapic. Please comment on this API: I intend to port my msi-x patch to work on top of it. I though the point of MSI is that they are just a regular memory writes, and don't require any special bus support. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
The PCI bus doesn't need any special support (I think) but something on the other end needs to interpret those writes. Sure. But there's definitely nothing PCI specific about it. I assumed this would all be contained within the APIC. In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: Paul Brook wrote: In any case we need some internal API for this, and qemu_irq looks like a good choice. What do you expect to be using this API? virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). It probably makes sense to have common infrastructure in pci.c to expose/implement device side MSI functionality. However I see no need for a direct API between the device and the APIC. We already have an API for memory accesses and MMIO regions. I'm pretty sure a system could implement MSI by pointing the device at system ram, and having the CPU periodically poll that. Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. I'm still not convinced. A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. OTOH stl_phys is exactly the right interface. The KVM interface should be contained within the APIC implementation. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. I'm not suggesting the qemu_irq will extend all the way to the apic. Think of it as connecting the device core with its interrupt unit. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. OTOH stl_phys is exactly the right interface. The qemu_irq callback should do an stl_phys(). The device is happy since it's using the same API it uses for non-MSI. MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. Either way, this is an implementation detail between pci.c and individual devices. It has nothing to do with the APIC. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, which is a trivial wrapper around stl_phys. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Paul Brook wrote: MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, which is a trivial wrapper around stl_phys. To clarify, you seem to be trying to fuse two largely separate features together. MSI is a standard PCI device capability[1] that involves the device performing a 32-bit memory write when something interesting occurs. These writes may or may not be directed at a APIC. The x86 APIC has a memory mapped interface that allows generation of CPU interrupts in response response to memory writes. These may or may not come from an MSI capable PCI device. Paul [1] Note a *device* capability, not a bus capability. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: Paul Brook wrote: which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. kvm has a fast path for irq injection. If qemu wants to support it we need some abstraction here. Fast path from where to where? Having the PCI layer bypass/re-implement the APIC and inject the interrupt directly into the cpu core sounds a particularly bad idea. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: On Thursday 21 May 2009, Avi Kivity wrote: Paul Brook wrote: which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. kvm has a fast path for irq injection. If qemu wants to support it we need some abstraction here. Fast path from where to where? Having the PCI layer bypass/re-implement the APIC and inject the interrupt directly into the cpu core sounds a particularly bad idea. kvm implements the APIC in the host kernel (qemu upstream doesn't support this yet). The fast path is wired to the in-kernel APIC, not the cpu core directly. The idea is to wire it to UIO for device assignment, to a virtio-device implemented in the kernel, and to qemu. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: kvm implements the APIC in the host kernel (qemu upstream doesn't support this yet). The fast path is wired to the in-kernel APIC, not the cpu core directly. The idea is to wire it to UIO for device assignment, to a virtio-device implemented in the kernel, and to qemu. I still don't see why you're trying to bypass straight from the pci layer to the apic. Why can't you just pass the apic MMIO writes to the kernel? You've presumably got to update the apic state anyway. The fast path is an eventfd so that we don't have to teach all the clients about the details of MSI. Userspace programs the MSI details into kvm and hands the client an eventfd. All the client has to do is bang on the eventfd for the interrupt to be queued. The eventfd provides event coalescing and is equally useful from the kernel and userspace, and can be used with targets other than kvm. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: Paul Brook wrote: kvm implements the APIC in the host kernel (qemu upstream doesn't support this yet). The fast path is wired to the in-kernel APIC, not the cpu core directly. The idea is to wire it to UIO for device assignment, to a virtio-device implemented in the kernel, and to qemu. I still don't see why you're trying to bypass straight from the pci layer to the apic. Why can't you just pass the apic MMIO writes to the kernel? You've presumably got to update the apic state anyway. The fast path is an eventfd so that we don't have to teach all the clients about the details of MSI. Userspace programs the MSI details into kvm and hands the client an eventfd. All the client has to do is bang on the eventfd for the interrupt to be queued. The eventfd provides event coalescing and is equally useful from the kernel and userspace, and can be used with targets other than kvm. So presumably if a device triggers an APIC interrupt using a write that isn't one of the currently configured PCI devices, it all explodes horribly? Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote: kvm has no business messing with the PCI device code. kvm has a fast path for irq injection. If qemu wants to support it we need some abstraction here. Fast path from where to where? Having the PCI layer bypass/re-implement the APIC and inject the interrupt directly into the cpu core sounds a particularly bad idea. kvm implements the APIC in the host kernel (qemu upstream doesn't support this yet). The fast path is wired to the in-kernel APIC, not the cpu core directly. The idea is to wire it to UIO for device assignment, to a virtio-device implemented in the kernel, and to qemu. I still don't see why you're trying to bypass straight from the pci layer to the apic. Why can't you just pass the apic MMIO writes to the kernel? You've presumably got to update the apic state anyway. Paul As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes. They are PCI memory writes to a hard-coded address range that are passed to APIC. I don't think MMIO writes can triger MSI, or at least this does not seem to be documented. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote: On Thursday 21 May 2009, Paul Brook wrote: MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, which is a trivial wrapper around stl_phys. To clarify, you seem to be trying to fuse two largely separate features together. MSI is a standard PCI device capability[1] that involves the device performing a 32-bit memory write when something interesting occurs. These writes may or may not be directed at a APIC. The x86 APIC has a memory mapped interface that allows generation of CPU interrupts in response response to memory writes. These may or may not come from an MSI capable PCI device. Paul [1] Note a *device* capability, not a bus capability. Paul, so I went over specs, and what you say about APIC here does not seem to be what Intel actually implemented. Specifically, Intel implemented *MSI support in APIC*. This lets PCI devices, but not the CPU, signal interrupts by memory writes. For example, after reset, when CPU writes to address 0xfee0 this is an access to a reserved register in APIC, but when PCI device does write to 0xfee0, this triggers an interrupt to destination 0. See section 9.12 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 http://www.intel.com/Assets/PDF/manual/253668.pdf So it seems that what we need to do in pci is: if (!msi_ops || msi_ops-send_msi(address, data)) stl_phy(address, data); where send_msi is wired to apic_send_msi and where apic_send_msi returns an error for an address outside of the MSI range 0xfee0 - 0xfeef Makes sense? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote: On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote: On Thursday 21 May 2009, Paul Brook wrote: MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, which is a trivial wrapper around stl_phys. To clarify, you seem to be trying to fuse two largely separate features together. MSI is a standard PCI device capability[1] that involves the device performing a 32-bit memory write when something interesting occurs. These writes may or may not be directed at a APIC. The x86 APIC has a memory mapped interface that allows generation of CPU interrupts in response response to memory writes. These may or may not come from an MSI capable PCI device. Paul [1] Note a *device* capability, not a bus capability. Paul, so I went over specs, and what you say about APIC here does not seem to be what Intel actually implemented. Specifically, Intel implemented *MSI support in APIC*. This lets PCI devices, but not the CPU, signal interrupts by memory writes. For example, after reset, when CPU writes to address 0xfee0 this is an access to a reserved register in APIC, but when PCI device does write to 0xfee0, this triggers an interrupt to destination 0. See section 9.12 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 http://www.intel.com/Assets/PDF/manual/253668.pdf So it seems that what we need to do in pci is: if (!msi_ops || msi_ops-send_msi(address, data)) stl_phy(address, data); where send_msi is wired to apic_send_msi and where apic_send_msi returns an error for an address outside of the MSI range 0xfee0 - 0xfeef Makes sense? So I ended up with these ops: allocate free update send which APIC will define and MSI emulation will use. Here, send will return error for addresses outside 0xfeex range, and device will do a plain stl_phy. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen/blkfront: beyond ARRAY_SIZE of info-shadow
Roel Kluin wrote: Do not go beyond ARRAY_SIZE of info-shadow Signed-off-by: Roel Kluin roel.kl...@gmail.com Acked-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Jens, can you put this into a next-merge-window branch? Thanks, J --- diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index a6cbf7b..d395986 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock); static int get_id_from_freelist(struct blkfront_info *info) { unsigned long free = info-shadow_free; - BUG_ON(free BLK_RING_SIZE); + BUG_ON(free = BLK_RING_SIZE); info-shadow_free = info-shadow[free].req.id; info-shadow[free].req.id = 0x0fee; /* debug */ return free; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH] xen/blkfront: beyond ARRAY_SIZE of info-shadow
Do not go beyond ARRAY_SIZE of info-shadow Signed-off-by: Roel Kluin roel.kl...@gmail.com --- diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index a6cbf7b..d395986 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock); static int get_id_from_freelist(struct blkfront_info *info) { unsigned long free = info-shadow_free; - BUG_ON(free BLK_RING_SIZE); + BUG_ON(free = BLK_RING_SIZE); info-shadow_free = info-shadow[free].req.id; info-shadow[free].req.id = 0x0fee; /* debug */ return free; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization