[PATCH] xen/blkfront: beyond ARRAY_SIZE of info->shadow
Do not go beyond ARRAY_SIZE of info->shadow Signed-off-by: Roel Kluin --- diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index a6cbf7b..d395986 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock); static int get_id_from_freelist(struct blkfront_info *info) { unsigned long free = info->shadow_free; - BUG_ON(free > BLK_RING_SIZE); + BUG_ON(free >= BLK_RING_SIZE); info->shadow_free = info->shadow[free].req.id; info->shadow[free].req.id = 0x0fee; /* debug */ return free; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen/blkfront: beyond ARRAY_SIZE of info->shadow
Roel Kluin wrote: > Do not go beyond ARRAY_SIZE of info->shadow > > Signed-off-by: Roel Kluin > Acked-by: Jeremy Fitzhardinge Jens, can you put this into a next-merge-window branch? Thanks, J > --- > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index a6cbf7b..d395986 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock); > static int get_id_from_freelist(struct blkfront_info *info) > { > unsigned long free = info->shadow_free; > - BUG_ON(free > BLK_RING_SIZE); > + BUG_ON(free >= BLK_RING_SIZE); > info->shadow_free = info->shadow[free].req.id; > info->shadow[free].req.id = 0x0fee; /* debug */ > return free; > > ___ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/virtualization > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote: > On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote: > > On Thursday 21 May 2009, Paul Brook wrote: > > > > > MSI provides multiple edge triggered interrupts, whereas traditional > > > > > mode provides a single level triggered interrupt. My guess is most > > > > > devices will want to treat these differently anyway. > > > > > > > > So, is qemu_send_msi better than qemu_set_irq. > > > > > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys. > > > > To clarify, you seem to be trying to fuse two largely separate features > > together. > > > > MSI is a standard PCI device capability[1] that involves the device > > performing > > a 32-bit memory write when something interesting occurs. These writes may > > or > > may not be directed at a APIC. > > > > The x86 APIC has a memory mapped interface that allows generation of CPU > > interrupts in response response to memory writes. These may or may not come > > from an MSI capable PCI device. > > > > Paul > > > > [1] Note a *device* capability, not a bus capability. > > Paul, so I went over specs, and what you say about APIC here does not > seem to be what Intel actually implemented. Specifically, Intel > implemented *MSI support in APIC*. This lets PCI devices, but not the CPU, > signal interrupts by memory writes. > > For example, after reset, when CPU writes to address 0xfee0 this > is an access to a reserved register in APIC, but when PCI device > does write to 0xfee0, this triggers an interrupt to destination 0. > > See section 9.12 in Intel® 64 and IA-32 Architectures Software > Developer’s Manual Volume 3A: System Programming Guide, Part 1 > http://www.intel.com/Assets/PDF/manual/253668.pdf > > So it seems that what we need to do in pci is: > > if (!msi_ops || msi_ops->send_msi(address, data)) > stl_phy(address, data); > > where send_msi is wired to apic_send_msi and > where apic_send_msi returns an error for an address > outside of the MSI range 0xfee0 - 0xfeef > > Makes sense? So I ended up with these ops: allocate free update send which APIC will define and MSI emulation will use. Here, send will return error for addresses outside 0xfeex range, and device will do a plain stl_phy. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote: > On Thursday 21 May 2009, Paul Brook wrote: > > > > MSI provides multiple edge triggered interrupts, whereas traditional > > > > mode provides a single level triggered interrupt. My guess is most > > > > devices will want to treat these differently anyway. > > > > > > So, is qemu_send_msi better than qemu_set_irq. > > > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys. > > To clarify, you seem to be trying to fuse two largely separate features > together. > > MSI is a standard PCI device capability[1] that involves the device > performing > a 32-bit memory write when something interesting occurs. These writes may or > may not be directed at a APIC. > > The x86 APIC has a memory mapped interface that allows generation of CPU > interrupts in response response to memory writes. These may or may not come > from an MSI capable PCI device. > > Paul > > [1] Note a *device* capability, not a bus capability. Paul, so I went over specs, and what you say about APIC here does not seem to be what Intel actually implemented. Specifically, Intel implemented *MSI support in APIC*. This lets PCI devices, but not the CPU, signal interrupts by memory writes. For example, after reset, when CPU writes to address 0xfee0 this is an access to a reserved register in APIC, but when PCI device does write to 0xfee0, this triggers an interrupt to destination 0. See section 9.12 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 http://www.intel.com/Assets/PDF/manual/253668.pdf So it seems that what we need to do in pci is: if (!msi_ops || msi_ops->send_msi(address, data)) stl_phy(address, data); where send_msi is wired to apic_send_msi and where apic_send_msi returns an error for an address outside of the MSI range 0xfee0 - 0xfeef Makes sense? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote: > > >>> kvm has no business messing with the PCI device code. > > >> > > >> kvm has a fast path for irq injection. If qemu wants to support it we > > >> need some abstraction here. > > > > > > Fast path from where to where? Having the PCI layer bypass/re-implement > > > the APIC and inject the interrupt directly into the cpu core sounds a > > > particularly bad idea. > > > > kvm implements the APIC in the host kernel (qemu upstream doesn't > > support this yet). The fast path is wired to the in-kernel APIC, not > > the cpu core directly. > > > > The idea is to wire it to UIO for device assignment, to a virtio-device > > implemented in the kernel, and to qemu. > > I still don't see why you're trying to bypass straight from the pci layer to > the apic. Why can't you just pass the apic MMIO writes to the kernel? You've > presumably got to update the apic state anyway. > > Paul As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes. They are PCI memory writes to a hard-coded address range that are passed to APIC. I don't think MMIO writes can triger MSI, or at least this does not seem to be documented. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: >> The fast path is an eventfd so that we don't have to teach all the >> clients about the details of MSI. Userspace programs the MSI details >> into kvm and hands the client an eventfd. All the client has to do is >> bang on the eventfd for the interrupt to be queued. The eventfd >> provides event coalescing and is equally useful from the kernel and >> userspace, and can be used with targets other than kvm. >> > > So presumably if a device triggers an APIC interrupt using a write that isn't > one of the currently configured PCI devices, it all explodes horribly? > I don't follow. Can you elaborate on your scenario? -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: > Paul Brook wrote: > >> kvm implements the APIC in the host kernel (qemu upstream doesn't > >> support this yet). The fast path is wired to the in-kernel APIC, not > >> the cpu core directly. > >> > >> The idea is to wire it to UIO for device assignment, to a virtio-device > >> implemented in the kernel, and to qemu. > > > > I still don't see why you're trying to bypass straight from the pci layer > > to the apic. Why can't you just pass the apic MMIO writes to the kernel? > > You've presumably got to update the apic state anyway. > > The fast path is an eventfd so that we don't have to teach all the > clients about the details of MSI. Userspace programs the MSI details > into kvm and hands the client an eventfd. All the client has to do is > bang on the eventfd for the interrupt to be queued. The eventfd > provides event coalescing and is equally useful from the kernel and > userspace, and can be used with targets other than kvm. So presumably if a device triggers an APIC interrupt using a write that isn't one of the currently configured PCI devices, it all explodes horribly? Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: >> kvm implements the APIC in the host kernel (qemu upstream doesn't >> support this yet). The fast path is wired to the in-kernel APIC, not >> the cpu core directly. >> >> The idea is to wire it to UIO for device assignment, to a virtio-device >> implemented in the kernel, and to qemu. >> > > I still don't see why you're trying to bypass straight from the pci layer to > the apic. Why can't you just pass the apic MMIO writes to the kernel? You've > presumably got to update the apic state anyway. > The fast path is an eventfd so that we don't have to teach all the clients about the details of MSI. Userspace programs the MSI details into kvm and hands the client an eventfd. All the client has to do is bang on the eventfd for the interrupt to be queued. The eventfd provides event coalescing and is equally useful from the kernel and userspace, and can be used with targets other than kvm. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote: > > > MSI provides multiple edge triggered interrupts, whereas traditional mode > > > provides a single level triggered interrupt. My guess is most devices > > > will want to treat these differently anyway. > > > > So, is qemu_send_msi better than qemu_set_irq. > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys. I guess I'll start with that. This only works if target_phys_addr_t is a 64 bit field (because MSI addresses are typically outside the 32 bit memory space), I guess the simplest solution is to disable MSI if it's not so. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> >>> kvm has no business messing with the PCI device code. > >> > >> kvm has a fast path for irq injection. If qemu wants to support it we > >> need some abstraction here. > > > > Fast path from where to where? Having the PCI layer bypass/re-implement > > the APIC and inject the interrupt directly into the cpu core sounds a > > particularly bad idea. > > kvm implements the APIC in the host kernel (qemu upstream doesn't > support this yet). The fast path is wired to the in-kernel APIC, not > the cpu core directly. > > The idea is to wire it to UIO for device assignment, to a virtio-device > implemented in the kernel, and to qemu. I still don't see why you're trying to bypass straight from the pci layer to the apic. Why can't you just pass the apic MMIO writes to the kernel? You've presumably got to update the apic state anyway. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: > On Thursday 21 May 2009, Avi Kivity wrote: > >> Paul Brook wrote: >> > which is a trivial wrapper around stl_phys. > OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. >>> kvm has no business messing with the PCI device code. >>> >> kvm has a fast path for irq injection. If qemu wants to support it we >> need some abstraction here. >> > > Fast path from where to where? Having the PCI layer bypass/re-implement the > APIC and inject the interrupt directly into the cpu core sounds a > particularly > bad idea. > kvm implements the APIC in the host kernel (qemu upstream doesn't support this yet). The fast path is wired to the in-kernel APIC, not the cpu core directly. The idea is to wire it to UIO for device assignment, to a virtio-device implemented in the kernel, and to qemu. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: > Paul Brook wrote: > >>> which is a trivial wrapper around stl_phys. > >> > >> OK, but I'm adding another level of indirection in the middle, > >> to allow us to tie in a kvm backend. > > > > kvm has no business messing with the PCI device code. > > kvm has a fast path for irq injection. If qemu wants to support it we > need some abstraction here. Fast path from where to where? Having the PCI layer bypass/re-implement the APIC and inject the interrupt directly into the cpu core sounds a particularly bad idea. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> > which is a trivial wrapper around stl_phys. > > OK, but I'm adding another level of indirection in the middle, > to allow us to tie in a kvm backend. kvm has no business messing with the PCI device code. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Paul Brook wrote: > > > MSI provides multiple edge triggered interrupts, whereas traditional > > > mode provides a single level triggered interrupt. My guess is most > > > devices will want to treat these differently anyway. > > > > So, is qemu_send_msi better than qemu_set_irq. > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys. To clarify, you seem to be trying to fuse two largely separate features together. MSI is a standard PCI device capability[1] that involves the device performing a 32-bit memory write when something interesting occurs. These writes may or may not be directed at a APIC. The x86 APIC has a memory mapped interface that allows generation of CPU interrupts in response response to memory writes. These may or may not come from an MSI capable PCI device. Paul [1] Note a *device* capability, not a bus capability. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> > MSI provides multiple edge triggered interrupts, whereas traditional mode > > provides a single level triggered interrupt. My guess is most devices > > will want to treat these differently anyway. > > So, is qemu_send_msi better than qemu_set_irq. Neither. pci_send_msi, which is a trivial wrapper around stl_phys. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> > A tight coupling between PCI devices and the APIC is just going to cause > > us problems later one. I'm going to come back to the fact that these are > > memory writes so once we get IOMMU support they will presumably be > > subject to remapping by that, just like any other memory access. > > I'm not suggesting the qemu_irq will extend all the way to the apic. > Think of it as connecting the device core with its interrupt unit. > > > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a > > one- off event, not a level state. OTOH stl_phys is exactly the right > > interface. > > The qemu_irq callback should do an stl_phys(). The device is happy > since it's using the same API it uses for non-MSI. MSI provides multiple edge triggered interrupts, whereas traditional mode provides a single level triggered interrupt. My guess is most devices will want to treat these differently anyway. Either way, this is an implementation detail between pci.c and individual devices. It has nothing to do with the APIC. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thursday 21 May 2009, Avi Kivity wrote: > Paul Brook wrote: > In any case we need some internal API for this, and qemu_irq looks > like a good choice. > >>> > >>> What do you expect to be using this API? > >> > >> virtio, emulated devices capable of supporting MSI (e1000?), device > >> assignment (not yet in qemu.git). > > > > It probably makes sense to have common infrastructure in pci.c to > > expose/implement device side MSI functionality. However I see no need for > > a direct API between the device and the APIC. We already have an API for > > memory accesses and MMIO regions. I'm pretty sure a system could > > implement MSI by pointing the device at system ram, and having the CPU > > periodically poll that. > > Instead of writing directly, let's abstract it behind a qemu_set_irq(). > This is easier for device authors. The default implementation of the > irq callback could write to apic memory, while for kvm we can directly > trigger the interrupt via the kvm APIs. I'm still not convinced. A tight coupling between PCI devices and the APIC is just going to cause us problems later one. I'm going to come back to the fact that these are memory writes so once we get IOMMU support they will presumably be subject to remapping by that, just like any other memory access. Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- off event, not a level state. OTOH stl_phys is exactly the right interface. The KVM interface should be contained within the APIC implementation. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> >> The PCI bus doesn't need any special support (I think) but something on > >> the other end needs to interpret those writes. > > > > Sure. But there's definitely nothing PCI specific about it. I assumed > > this would all be contained within the APIC. > > MSIs are defined by PCI and their configuration is done using the PCI > configuration space. A MSI is just a regular memory write, and the PCI spec explicitly states that a target (e.g. the APIC) is unable to distinguish between a MSI and any other write. The PCI config bits just provide a way of telling the device where/what to write. > >> In any case we need some internal API for this, and qemu_irq looks like > >> a good choice. > > > > What do you expect to be using this API? > > virtio, emulated devices capable of supporting MSI (e1000?), device > assignment (not yet in qemu.git). It probably makes sense to have common infrastructure in pci.c to expose/implement device side MSI functionality. However I see no need for a direct API between the device and the APIC. We already have an API for memory accesses and MMIO regions. I'm pretty sure a system could implement MSI by pointing the device at system ram, and having the CPU periodically poll that. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
> The PCI bus doesn't need any special support (I think) but something on > the other end needs to interpret those writes. Sure. But there's definitely nothing PCI specific about it. I assumed this would all be contained within the APIC. > In any case we need some internal API for this, and qemu_irq looks like > a good choice. What do you expect to be using this API? Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Wednesday 20 May 2009, Michael S. Tsirkin wrote: > define api for allocating/setting up msi-x irqs, and for updating them > with msi-x vector information, supply implementation in ioapic. Please > comment on this API: I intend to port my msi-x patch to work on top of > it. I though the point of MSI is that they are just a regular memory writes, and don't require any special bus support. Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote: > > > which is a trivial wrapper around stl_phys. > > > > OK, but I'm adding another level of indirection in the middle, > > to allow us to tie in a kvm backend. > > kvm has no business messing with the PCI device code. Yes it has :) kvm needs data on MSI entries: that's the interface current kernel exposes for injecting these interrupts. I think we also need to support in-kernel devices which would inject MSI interrupt directly from kernel. For these, kvm would need to know when mask bit changes and give us info on pending bit. That's a fair amount of PCI specific code in kvm. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: >>> which is a trivial wrapper around stl_phys. >>> >> OK, but I'm adding another level of indirection in the middle, >> to allow us to tie in a kvm backend. >> > > kvm has no business messing with the PCI device code. > kvm has a fast path for irq injection. If qemu wants to support it we need some abstraction here. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote: > > > MSI provides multiple edge triggered interrupts, whereas traditional mode > > > provides a single level triggered interrupt. My guess is most devices > > > will want to treat these differently anyway. > > > > So, is qemu_send_msi better than qemu_set_irq. > > Neither. pci_send_msi, Works for me. > which is a trivial wrapper around stl_phys. OK, but I'm adding another level of indirection in the middle, to allow us to tie in a kvm backend. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 02:09:32PM +0100, Paul Brook wrote: > > > A tight coupling between PCI devices and the APIC is just going to cause > > > us problems later one. I'm going to come back to the fact that these are > > > memory writes so once we get IOMMU support they will presumably be > > > subject to remapping by that, just like any other memory access. > > > > I'm not suggesting the qemu_irq will extend all the way to the apic. > > Think of it as connecting the device core with its interrupt unit. > > > > > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a > > > one- off event, not a level state. OTOH stl_phys is exactly the right > > > interface. > > > > The qemu_irq callback should do an stl_phys(). The device is happy > > since it's using the same API it uses for non-MSI. > > MSI provides multiple edge triggered interrupts, whereas traditional mode > provides a single level triggered interrupt. My guess is most devices will > want to treat these differently anyway. So, is qemu_send_msi better than qemu_set_irq. > Either way, this is an implementation detail between pci.c and individual > devices. It has nothing to do with the APIC. > > Paul -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote: > Paul Brook wrote: >>> Instead of writing directly, let's abstract it behind a qemu_set_irq(). >>> This is easier for device authors. The default implementation of the >>> irq callback could write to apic memory, while for kvm we can directly >>> trigger the interrupt via the kvm APIs. >>> >> >> I'm still not convinced. >> >> A tight coupling between PCI devices and the APIC is just going to >> cause us problems later one. I'm going to come back to the fact that >> these are memory writes so once we get IOMMU support they will >> presumably be subject to remapping by that, just like any other memory >> access. >> > > I'm not suggesting the qemu_irq will extend all the way to the apic. > Think of it as connecting the device core with its interrupt unit. > >> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a >> one- >> off event, not a level state. OTOH stl_phys is exactly the right interface. >> > > The qemu_irq callback should do an stl_phys(). Actually, it seems we can't do it this way now as stl_phys only gets a 32 bit address. So I'll use apic_deliver for now, but yes, it will be easy to later rewrite MSI implementation this way if that limitatiuon is lifted. > The device is happy > since it's using the same API it uses for non-MSI. The APIC is happy > since it isn't connected directly to the device. stl_phys() is happy > since it sees more traffic and can serve more ads. kvm is happy since > it can hijack the callback to throw the interrupt directly into the > kernel. > >> The KVM interface should be contained within the APIC implementation. >> > > Tricky, but doable. > > -- > error compiling committee.c: too many arguments to function -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote: > Paul Brook wrote: >>> Instead of writing directly, let's abstract it behind a qemu_set_irq(). >>> This is easier for device authors. The default implementation of the >>> irq callback could write to apic memory, while for kvm we can directly >>> trigger the interrupt via the kvm APIs. >>> >> >> I'm still not convinced. >> >> A tight coupling between PCI devices and the APIC is just going to >> cause us problems later one. I'm going to come back to the fact that >> these are memory writes so once we get IOMMU support they will >> presumably be subject to remapping by that, just like any other memory >> access. >> > > I'm not suggesting the qemu_irq will extend all the way to the apic. > Think of it as connecting the device core with its interrupt unit. > >> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a >> one- >> off event, not a level state. OTOH stl_phys is exactly the right interface. >> > > The qemu_irq callback should do an stl_phys(). The device is happy > since it's using the same API it uses for non-MSI. The APIC is happy > since it isn't connected directly to the device. stl_phys() is happy > since it sees more traffic and can serve more ads. kvm is happy since > it can hijack the callback to throw the interrupt directly into the > kernel. I like the monetization angle here. >> The KVM interface should be contained within the APIC implementation. >> > > Tricky, but doable. I'd rather keep it simple. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote: > On Thursday 21 May 2009, Avi Kivity wrote: > > Paul Brook wrote: > > In any case we need some internal API for this, and qemu_irq looks > > like a good choice. > > >>> > > >>> What do you expect to be using this API? > > >> > > >> virtio, emulated devices capable of supporting MSI (e1000?), device > > >> assignment (not yet in qemu.git). > > > > > > It probably makes sense to have common infrastructure in pci.c to > > > expose/implement device side MSI functionality. However I see no need for > > > a direct API between the device and the APIC. We already have an API for > > > memory accesses and MMIO regions. I'm pretty sure a system could > > > implement MSI by pointing the device at system ram, and having the CPU > > > periodically poll that. > > > > Instead of writing directly, let's abstract it behind a qemu_set_irq(). > > This is easier for device authors. The default implementation of the > > irq callback could write to apic memory, while for kvm we can directly > > trigger the interrupt via the kvm APIs. > > I'm still not convinced. > > A tight coupling between PCI devices and the APIC is just going to cause us > problems later one. I'm going to come back to the fact that these are memory > writes so once we get IOMMU support they will presumably be subject to > remapping by that, just like any other memory access. Actually, MSI messages are handled by IOMMU as interrupts, not as regular memory accesses. iommu book has comments such as • Interrupt addresses are never translated to memory addresses, but other special address ranges may be reclaimed to be backed with memory. > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- > off event, not a level state. Yes, I just chose to ignore the level value. It does not look like such a big issue ... Do you advocate a new qemu_msi structure then? > OTOH stl_phys is exactly the right interface. Not really. These are writes but not into physical memory. For example, on intel 32 bit, stl_phys gets a 32 bit address, MSI writes encode the interrupt vector in high 32 bit bit of the address - way outside actual physical memory. > The KVM interface should be contained within the APIC implementation. > > Paul Unfortunately kvm capabilities that are present in current kernels do not map well to this interface. You need to perform expensive set up for each interrupt vector you are going to use, be it MSI or regular interrupt. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: >> Instead of writing directly, let's abstract it behind a qemu_set_irq(). >> This is easier for device authors. The default implementation of the >> irq callback could write to apic memory, while for kvm we can directly >> trigger the interrupt via the kvm APIs. >> > > I'm still not convinced. > > A tight coupling between PCI devices and the APIC is just going to cause us > problems later one. I'm going to come back to the fact that these are memory > writes so once we get IOMMU support they will presumably be subject to > remapping by that, just like any other memory access. > I'm not suggesting the qemu_irq will extend all the way to the apic. Think of it as connecting the device core with its interrupt unit. > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one- > off event, not a level state. OTOH stl_phys is exactly the right interface. > The qemu_irq callback should do an stl_phys(). The device is happy since it's using the same API it uses for non-MSI. The APIC is happy since it isn't connected directly to the device. stl_phys() is happy since it sees more traffic and can serve more ads. kvm is happy since it can hijack the callback to throw the interrupt directly into the kernel. > The KVM interface should be contained within the APIC implementation. > Tricky, but doable. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote: > Paul Brook wrote: > In any case we need some internal API for this, and qemu_irq looks like > a good choice. > What do you expect to be using this API? >>> virtio, emulated devices capable of supporting MSI (e1000?), device >>> assignment (not yet in qemu.git). >>> >> >> It probably makes sense to have common infrastructure in pci.c to >> expose/implement device side MSI functionality. However I see no need >> for a direct API between the device and the APIC. We already have an >> API for memory accesses and MMIO regions. I'm pretty sure a system >> could implement MSI by pointing the device at system ram, and having >> the CPU periodically poll that. >> > > Instead of writing directly, let's abstract it behind a qemu_set_irq(). > This is easier for device authors. The default implementation of the > irq callback could write to apic memory, while for kvm we can directly > trigger the interrupt via the kvm APIs. Right. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: In any case we need some internal API for this, and qemu_irq looks like a good choice. >>> What do you expect to be using this API? >>> >> virtio, emulated devices capable of supporting MSI (e1000?), device >> assignment (not yet in qemu.git). >> > > It probably makes sense to have common infrastructure in pci.c to > expose/implement device side MSI functionality. However I see no need for a > direct API between the device and the APIC. We already have an API for memory > accesses and MMIO regions. I'm pretty sure a system could implement MSI by > pointing the device at system ram, and having the CPU periodically poll that. > Instead of writing directly, let's abstract it behind a qemu_set_irq(). This is easier for device authors. The default implementation of the irq callback could write to apic memory, while for kvm we can directly trigger the interrupt via the kvm APIs. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: >> The PCI bus doesn't need any special support (I think) but something on >> the other end needs to interpret those writes. >> > > Sure. But there's definitely nothing PCI specific about it. I assumed this > would all be contained within the APIC. > MSIs are defined by PCI and their configuration is done using the PCI configuration space. >> In any case we need some internal API for this, and qemu_irq looks like >> a good choice. >> > > What do you expect to be using this API? > virtio, emulated devices capable of supporting MSI (e1000?), device assignment (not yet in qemu.git). -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote: > > The PCI bus doesn't need any special support (I think) but something on > > the other end needs to interpret those writes. > > Sure. But there's definitely nothing PCI specific about it. I assumed this > would all be contained within the APIC. Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts. For kvm, we'll have another implementation that passes these requests on to kernel. > > In any case we need some internal API for this, and qemu_irq looks like > > a good choice. > > What do you expect to be using this API? > > Paul emulated PCI devices such as virtio. Hope to send a patch shortly. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
Paul Brook wrote: > On Wednesday 20 May 2009, Michael S. Tsirkin wrote: > >> define api for allocating/setting up msi-x irqs, and for updating them >> with msi-x vector information, supply implementation in ioapic. Please >> comment on this API: I intend to port my msi-x patch to work on top of >> it. >> > > I though the point of MSI is that they are just a regular memory writes, and > don't require any special bus support. > The PCI bus doesn't need any special support (I think) but something on the other end needs to interpret those writes. In any case we need some internal API for this, and qemu_irq looks like a good choice. -- error compiling committee.c: too many arguments to function ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote: > On 5/20/09, Michael S. Tsirkin wrote: > > On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote: > > > On 5/20/09, Michael S. Tsirkin wrote: > > > > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote: > > > > > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote: > > > > > > On 5/20/09, Michael S. Tsirkin wrote: > > > > > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote: > > > > > > > > On 5/20/09, Michael S. Tsirkin wrote: > > > > > > > > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote: > > > > > > > > > > On 5/20/09, Michael S. Tsirkin wrote: > > > > > > > > > > > define api for allocating/setting up msi-x irqs, and > > for updating them > > > > > > > > > > > with msi-x vector information, supply implementation > > in ioapic. Please > > > > > > > > > > > comment on this API: I intend to port my msi-x patch > > to work on top of > > > > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > > > > > > > > > > > > > > > Sparc64 also uses packets ("mondos", not implemented > > yet) for > > > > > > > > > > interrupt vector data, there the packet size is 8 * 64 > > bits. > > > > > > > > > > I think we should aim for a more generic API that > > covers this case also. > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you sure this is a good idea? MSI is tied to PCI, and > > PCI only has > > > > > > > > > MSI, not "mondos". What code would benefit from this > > abstraction? > > > > > > > > > > > > > > > > Sparc64 emulation, of course. I think also the API would be > > neater. > > > > > > > > > > > > > > > > > > > > > Since "mondos" are not interrupts, why use irqs for them? > > > > > > > > > > > > I just said above that they are used for interrupt vector data. > > What > > > > > > makes you think they are not interrupts? > > > > > > > > > > I'm sorry, I don't really know anything about sparc. > > > > > All I am saying is that in PCI, interrupts never pass data, > > > > > so qemu_set_irq as it is now, is a good API to send them. > > > > > > > > > > For the sparc feature you describe, you probably want to add > > > > > a message data parameter to qemu_set_irq, but it's not > > > > > really useful for MSI. > > > > > > > > > > > > Just to clarify, the main difference is that with MSI/MSI-X > > > > both data and address fields are mostly static, modifying them > > > > involves ioapic and device updates which might be an expensive > > > > operation (e.g. with kvm, needs an extra system call). > > > > > > > > So I don't think it makes sense to pass MSI-X data field > > > > with each call to qemu_set_irq. > > > > > > No, but I think the Sparc situation is the same, the packet data is > > > static for the interrupt source in question. > > > > > > So, ok, we could add data update callback and then MSI and sparc > > would do their thing there. I'm not convinced I like all this > > play with untyped buffers, do you think it's helpful? > > > > If yes, maybe I'll try to code it up and see how does it look. > > Well, get/set_data could be limited to irq.c, ioapic.c could export > something like get/set_msi_data. MSI callers should only use > get/set_msi_data. Ditto for get/set_mondo_data. I started coding it up and got lost in a maze of callbacks and void pointers :) And I also remembered that besides data, there's a mask and pending bits in msi (it's ignored in the patch I sent, but I'm fixing that), so there needs to be an interface for this as well. I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I added, you'll find it's easy to implement mondos in very few lines of code. We can always add another level of indirection later when sparc code surfaces, if we find it helps. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell Date: Thu, 21 May 2009 16:27:05 +0930 > On Tue, 19 May 2009 12:10:13 pm David Miller wrote: >> What you're doing by orphan'ing is creating a situation where a single >> UDP socket can loop doing sends and monopolize the TX queue of a >> device. The only control we have over a sender for fairness in >> datagram protocols is that send buffer allocation. > > Urgh, that hadn't even occurred to me. Good point. Now this all is predicated on this actually mattering. :-) You could argue that the scheduler as well as the size of the TX queue should be limiting and enforcing fairness. Someone really needs to test this. Just skb_orphan() every packet at the beginning of dev_hard_start_xmit(), then run some test program with two clients looping out UDP packets to see if one can monopolize the device and get a significantly larger amount of TX resources than the other. Repeat for 3, 4, 5, etc. clients. > I haven't thought this through properly, but how about a hack where > we don't orphan packets if the ring is over half full? That would also work. And for the NIU case this would be great because I DO have a marker bit for triggering interrupts in the TX descriptors. There's just no "all empty" interrupt on TX (who designs these things? :( ). > Then I guess we could overload the watchdog as a more general > timer-after-no- xmit? Yes, but it means that teardown of a socket can be delayed up to the amount of that timer. Factor in all of this crazy round_jiffies() stuff people do these days and it could cause pauses for real use cases and drive users batty. Probably the most profitable avenue is to see if this is a real issue afterall (see above). If we can get away with having the socket buffer represent socket --> device space only, that's the most ideal solution. It will probably also improve performance a lot across the board, especially on NUMA/SMP boxes as our TX complete events tend to be in difference places than the SKB producer. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
On Tue, 19 May 2009 12:10:13 pm David Miller wrote: > From: Rusty Russell > Date: Mon, 18 May 2009 22:18:47 +0930 > > We check for finished xmit skbs on every xmit, or on a timer (unless > > the host promises to force an interrupt when the xmit ring is empty). > > This can penalize userspace tasks which fill their sockbuf. Not much > > difference with TSO, but measurable with large numbers of packets. > > > > There are a finite number of packets which can be in the transmission > > queue. We could fire the timer more than every 100ms, but that would > > just hurt performance for a corner case. This seems neatest. ... > > Signed-off-by: Rusty Russell > > If this is so great for virtio it would also be a great idea > universally, but we don't do it. > > What you're doing by orphan'ing is creating a situation where a single > UDP socket can loop doing sends and monopolize the TX queue of a > device. The only control we have over a sender for fairness in > datagram protocols is that send buffer allocation. Urgh, that hadn't even occurred to me. Good point. > I'm guilty of doing this too in the NIU driver, also because there I > lack a "TX queue empty" interrupt and this can keep TCP sockets from > getting stuck. > > I think we need a generic solution to this issue because it is getting > quite common to see cases where the packets in the TX queue of a > device can sit there indefinitely. I haven't thought this through properly, but how about a hack where we don't orphan packets if the ring is over half full? Then I guess we could overload the watchdog as a more general timer-after-no- xmit? Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization