[PATCH] xen/blkfront: beyond ARRAY_SIZE of info->shadow

2009-05-21 Thread Roel Kluin
Do not go beyond ARRAY_SIZE of info->shadow

Signed-off-by: Roel Kluin 
---
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a6cbf7b..d395986 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock);
 static int get_id_from_freelist(struct blkfront_info *info)
 {
unsigned long free = info->shadow_free;
-   BUG_ON(free > BLK_RING_SIZE);
+   BUG_ON(free >= BLK_RING_SIZE);
info->shadow_free = info->shadow[free].req.id;
info->shadow[free].req.id = 0x0fee; /* debug */
return free;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen/blkfront: beyond ARRAY_SIZE of info->shadow

2009-05-21 Thread Jeremy Fitzhardinge
Roel Kluin wrote:
> Do not go beyond ARRAY_SIZE of info->shadow
>
> Signed-off-by: Roel Kluin 
>   
Acked-by: Jeremy Fitzhardinge 

Jens, can you put this into a next-merge-window branch?

Thanks,
J
> ---
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index a6cbf7b..d395986 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock);
>  static int get_id_from_freelist(struct blkfront_info *info)
>  {
>   unsigned long free = info->shadow_free;
> - BUG_ON(free > BLK_RING_SIZE);
> + BUG_ON(free >= BLK_RING_SIZE);
>   info->shadow_free = info->shadow[free].req.id;
>   info->shadow[free].req.id = 0x0fee; /* debug */
>   return free;
>
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>
>   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> > On Thursday 21 May 2009, Paul Brook wrote:
> > > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > > mode provides a single level triggered interrupt. My guess is most
> > > > > devices will want to treat these differently anyway.
> > > >
> > > > So, is qemu_send_msi better than qemu_set_irq.
> > >
> > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> > 
> > To clarify, you seem to be trying to fuse two largely separate features 
> > together.
> > 
> > MSI is a standard PCI device capability[1] that involves the device 
> > performing 
> > a 32-bit memory write when something interesting occurs. These writes may 
> > or 
> > may not be directed at a APIC.
> > 
> > The x86 APIC has a memory mapped interface that allows generation of CPU 
> > interrupts in response response to memory writes. These may or may not come 
> > from an MSI capable PCI device.
> > 
> > Paul
> > 
> > [1] Note a *device* capability, not a bus capability.
> 
> Paul, so I went over specs, and what you say about APIC here does not
> seem to be what Intel actually implemented.  Specifically, Intel
> implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
> signal interrupts by memory writes.
> 
> For example, after reset, when CPU writes to address 0xfee0 this
> is an access to a reserved register in APIC, but when PCI device
> does write to 0xfee0, this triggers an interrupt to destination 0.
> 
> See section 9.12 in Intel® 64 and IA-32 Architectures Software
> Developer’s Manual Volume 3A: System Programming Guide, Part 1
> http://www.intel.com/Assets/PDF/manual/253668.pdf
> 
> So it seems that what we need to do in pci is:
> 
> if (!msi_ops || msi_ops->send_msi(address, data))
>   stl_phy(address, data);
> 
> where send_msi is wired to apic_send_msi and
> where apic_send_msi returns an error for an address
> outside of the MSI range 0xfee0 - 0xfeef
> 
> Makes sense?


So I ended up with these ops:
allocate
free
update
send
which APIC will define and MSI emulation
will use. Here, send will return error for addresses
outside 0xfeex range, and device will do a plain
stl_phy.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Paul Brook wrote:
> > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > mode provides a single level triggered interrupt. My guess is most
> > > > devices will want to treat these differently anyway.
> > >
> > > So, is qemu_send_msi better than qemu_set_irq.
> >
> > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> 
> To clarify, you seem to be trying to fuse two largely separate features 
> together.
> 
> MSI is a standard PCI device capability[1] that involves the device 
> performing 
> a 32-bit memory write when something interesting occurs. These writes may or 
> may not be directed at a APIC.
> 
> The x86 APIC has a memory mapped interface that allows generation of CPU 
> interrupts in response response to memory writes. These may or may not come 
> from an MSI capable PCI device.
> 
> Paul
> 
> [1] Note a *device* capability, not a bus capability.

Paul, so I went over specs, and what you say about APIC here does not
seem to be what Intel actually implemented.  Specifically, Intel
implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
signal interrupts by memory writes.

For example, after reset, when CPU writes to address 0xfee0 this
is an access to a reserved register in APIC, but when PCI device
does write to 0xfee0, this triggers an interrupt to destination 0.

See section 9.12 in Intel® 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A: System Programming Guide, Part 1
http://www.intel.com/Assets/PDF/manual/253668.pdf

So it seems that what we need to do in pci is:

if (!msi_ops || msi_ops->send_msi(address, data))
stl_phy(address, data);

where send_msi is wired to apic_send_msi and
where apic_send_msi returns an error for an address
outside of the MSI range 0xfee0 - 0xfeef

Makes sense?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote:
> > >>> kvm has no business messing with the PCI device code.
> > >>
> > >> kvm has a fast path for irq injection.  If qemu wants to support it we
> > >> need some abstraction here.
> > >
> > > Fast path from where to where? Having the PCI layer bypass/re-implement
> > > the APIC and inject the interrupt directly into the cpu core sounds a
> > > particularly bad idea.
> >
> > kvm implements the APIC in the host kernel (qemu upstream doesn't
> > support this yet).  The fast path is wired to the in-kernel APIC, not
> > the cpu core directly.
> >
> > The idea is to wire it to UIO for device assignment, to a virtio-device
> > implemented in the kernel, and to qemu.
> 
> I still don't see why you're trying to bypass straight from the pci layer to 
> the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
> presumably got to update the apic state anyway.
> 
> Paul

As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes.
They are PCI memory writes to a hard-coded address range that
are passed to APIC. I don't think MMIO writes can triger MSI,
or at least this does not seem to be documented.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
>> The fast path is an eventfd so that we don't have to teach all the
>> clients about the details of MSI.  Userspace programs the MSI details
>> into kvm and hands the client an eventfd.  All the client has to do is
>> bang on the eventfd for the interrupt to be queued.  The eventfd
>> provides event coalescing and is equally useful from the kernel and
>> userspace, and can be used with targets other than kvm.
>> 
>
> So presumably if a device triggers an APIC interrupt using a write that isn't 
> one of the currently configured PCI devices, it all explodes horribly?
>   

I don't follow.  Can you elaborate on your scenario?

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >> kvm implements the APIC in the host kernel (qemu upstream doesn't
> >> support this yet).  The fast path is wired to the in-kernel APIC, not
> >> the cpu core directly.
> >>
> >> The idea is to wire it to UIO for device assignment, to a virtio-device
> >> implemented in the kernel, and to qemu.
> >
> > I still don't see why you're trying to bypass straight from the pci layer
> > to the apic. Why can't you just pass the apic MMIO writes to the kernel?
> > You've presumably got to update the apic state anyway.
>
> The fast path is an eventfd so that we don't have to teach all the
> clients about the details of MSI.  Userspace programs the MSI details
> into kvm and hands the client an eventfd.  All the client has to do is
> bang on the eventfd for the interrupt to be queued.  The eventfd
> provides event coalescing and is equally useful from the kernel and
> userspace, and can be used with targets other than kvm.

So presumably if a device triggers an APIC interrupt using a write that isn't 
one of the currently configured PCI devices, it all explodes horribly?

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
>> kvm implements the APIC in the host kernel (qemu upstream doesn't
>> support this yet).  The fast path is wired to the in-kernel APIC, not
>> the cpu core directly.
>>
>> The idea is to wire it to UIO for device assignment, to a virtio-device
>> implemented in the kernel, and to qemu.
>> 
>
> I still don't see why you're trying to bypass straight from the pci layer to 
> the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
> presumably got to update the apic state anyway.
>   

The fast path is an eventfd so that we don't have to teach all the 
clients about the details of MSI.  Userspace programs the MSI details 
into kvm and hands the client an eventfd.  All the client has to do is 
bang on the eventfd for the interrupt to be queued.  The eventfd 
provides event coalescing and is equally useful from the kernel and 
userspace, and can be used with targets other than kvm.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

I guess I'll start with that. This only works if target_phys_addr_t is
a 64 bit field (because MSI addresses are typically outside the 32 bit
memory space), I guess the simplest solution is to disable MSI if it's
not so.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> >>> kvm has no business messing with the PCI device code.
> >>
> >> kvm has a fast path for irq injection.  If qemu wants to support it we
> >> need some abstraction here.
> >
> > Fast path from where to where? Having the PCI layer bypass/re-implement
> > the APIC and inject the interrupt directly into the cpu core sounds a
> > particularly bad idea.
>
> kvm implements the APIC in the host kernel (qemu upstream doesn't
> support this yet).  The fast path is wired to the in-kernel APIC, not
> the cpu core directly.
>
> The idea is to wire it to UIO for device assignment, to a virtio-device
> implemented in the kernel, and to qemu.

I still don't see why you're trying to bypass straight from the pci layer to 
the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
presumably got to update the apic state anyway.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
> On Thursday 21 May 2009, Avi Kivity wrote:
>   
>> Paul Brook wrote:
>> 
> which is a trivial wrapper around stl_phys.
>   
 OK, but I'm adding another level of indirection in the middle,
 to allow us to tie in a kvm backend.
 
>>> kvm has no business messing with the PCI device code.
>>>   
>> kvm has a fast path for irq injection.  If qemu wants to support it we
>> need some abstraction here.
>> 
>
> Fast path from where to where? Having the PCI layer bypass/re-implement the 
> APIC and inject the interrupt directly into the cpu core sounds a 
> particularly 
> bad idea.
>   

kvm implements the APIC in the host kernel (qemu upstream doesn't 
support this yet).  The fast path is wired to the in-kernel APIC, not 
the cpu core directly.

The idea is to wire it to UIO for device assignment, to a virtio-device 
implemented in the kernel, and to qemu.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >>> which is a trivial wrapper around stl_phys.
> >>
> >> OK, but I'm adding another level of indirection in the middle,
> >> to allow us to tie in a kvm backend.
> >
> > kvm has no business messing with the PCI device code.
>
> kvm has a fast path for irq injection.  If qemu wants to support it we
> need some abstraction here.

Fast path from where to where? Having the PCI layer bypass/re-implement the 
APIC and inject the interrupt directly into the cpu core sounds a particularly 
bad idea.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > which is a trivial wrapper around stl_phys.
>
> OK, but I'm adding another level of indirection in the middle,
> to allow us to tie in a kvm backend.

kvm has no business messing with the PCI device code.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > mode provides a single level triggered interrupt. My guess is most
> > > devices will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
>
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

To clarify, you seem to be trying to fuse two largely separate features 
together.

MSI is a standard PCI device capability[1] that involves the device performing 
a 32-bit memory write when something interesting occurs. These writes may or 
may not be directed at a APIC.

The x86 APIC has a memory mapped interface that allows generation of CPU 
interrupts in response response to memory writes. These may or may not come 
from an MSI capable PCI device.

Paul

[1] Note a *device* capability, not a bus capability.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > provides a single level triggered interrupt. My guess is most devices
> > will want to treat these differently anyway.
>
> So, is qemu_send_msi better than qemu_set_irq.

Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > A tight coupling between PCI devices and the APIC is just going to cause
> > us problems later one. I'm going to come back to the fact that these are
> > memory writes so once we get IOMMU support they will presumably be
> > subject to remapping by that, just like any other memory access.
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.
> Think of it as connecting the device core with its interrupt unit.
>
> > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > one- off event, not a level state. OTOH stl_phys is exactly the right
> > interface.
>
> The qemu_irq callback should do an stl_phys().  The device is happy
> since it's using the same API it uses for non-MSI. 

MSI provides multiple edge triggered interrupts, whereas traditional mode 
provides a single level triggered interrupt. My guess is most devices will 
want to treat these differently anyway.

Either way, this is an implementation detail between pci.c and individual 
devices. It has nothing to do with the APIC.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
>  In any case we need some internal API for this, and qemu_irq looks
>  like a good choice.
> >>>
> >>> What do you expect to be using this API?
> >>
> >> virtio, emulated devices capable of supporting MSI (e1000?), device
> >> assignment (not yet in qemu.git).
> >
> > It probably makes sense to have common infrastructure in pci.c to
> > expose/implement device side MSI functionality. However I see no need for
> > a direct API between the device and the APIC. We already have an API for
> > memory accesses and MMIO regions. I'm pretty sure a system could
> > implement MSI by pointing the device at system ram, and having the CPU
> > periodically poll that.
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().
> This is easier for device authors.  The default implementation of the
> irq callback could write to apic memory, while for kvm we can directly
> trigger the interrupt via the kvm APIs.

I'm still not convinced.

A tight coupling between PCI devices and the APIC is just going to cause us 
problems later one. I'm going to come back to the fact that these are memory 
writes so once we get IOMMU support they will presumably be subject to 
remapping by that, just like any other memory access.

Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
off event, not a level state. OTOH stl_phys is exactly the right interface.

The KVM interface should be contained within the APIC implementation.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> >> The PCI bus doesn't need any special support (I think) but something on
> >> the other end needs to interpret those writes.
> >
> > Sure. But there's definitely nothing PCI specific about it. I assumed
> > this would all be contained within the APIC.
>
> MSIs are defined by PCI and their configuration is done using the PCI
> configuration space.

A MSI is just a regular memory write, and the PCI spec explicitly states that 
a target (e.g. the APIC) is unable to distinguish between a MSI and any other 
write. The PCI config bits just provide a way of telling the device where/what 
to write.

> >> In any case we need some internal API for this, and qemu_irq looks like
> >> a good choice.
> >
> > What do you expect to be using this API?
>
> virtio, emulated devices capable of supporting MSI (e1000?), device
> assignment (not yet in qemu.git).

It probably makes sense to have common infrastructure in pci.c to 
expose/implement device side MSI functionality. However I see no need for a 
direct API between the device and the APIC. We already have an API for memory 
accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
pointing the device at system ram, and having the CPU periodically poll that.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> The PCI bus doesn't need any special support (I think) but something on
> the other end needs to interpret those writes.

Sure. But there's definitely nothing PCI specific about it. I assumed this 
would all be contained within the APIC.

> In any case we need some internal API for this, and qemu_irq looks like
> a good choice.

What do you expect to be using this API?

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
> define api for allocating/setting up msi-x irqs, and for updating them
> with msi-x vector information, supply implementation in ioapic. Please
> comment on this API: I intend to port my msi-x patch to work on top of
> it.

I though the point of MSI is that they are just a regular memory writes, and 
don't require any special bus support.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote:
> > > which is a trivial wrapper around stl_phys.
> >
> > OK, but I'm adding another level of indirection in the middle,
> > to allow us to tie in a kvm backend.
> 
> kvm has no business messing with the PCI device code.

Yes it has :)

kvm needs data on MSI entries: that's the interface
current kernel exposes for injecting these interrupts.

I think we also need to support in-kernel devices which
would inject MSI interrupt directly from kernel.
For these, kvm would need to know when mask bit changes
and give us info on pending bit.

That's a fair amount of PCI specific code in kvm.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
>>> which is a trivial wrapper around stl_phys.
>>>   
>> OK, but I'm adding another level of indirection in the middle,
>> to allow us to tie in a kvm backend.
>> 
>
> kvm has no business messing with the PCI device code.
>   

kvm has a fast path for irq injection.  If qemu wants to support it we 
need some abstraction here.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi,

Works for me.

> which is a trivial wrapper around stl_phys.

OK, but I'm adding another level of indirection in the middle,
to allow us to tie in a kvm backend.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:09:32PM +0100, Paul Brook wrote:
> > > A tight coupling between PCI devices and the APIC is just going to cause
> > > us problems later one. I'm going to come back to the fact that these are
> > > memory writes so once we get IOMMU support they will presumably be
> > > subject to remapping by that, just like any other memory access.
> >
> > I'm not suggesting the qemu_irq will extend all the way to the apic.
> > Think of it as connecting the device core with its interrupt unit.
> >
> > > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > > one- off event, not a level state. OTOH stl_phys is exactly the right
> > > interface.
> >
> > The qemu_irq callback should do an stl_phys().  The device is happy
> > since it's using the same API it uses for non-MSI. 
> 
> MSI provides multiple edge triggered interrupts, whereas traditional mode 
> provides a single level triggered interrupt. My guess is most devices will 
> want to treat these differently anyway.

So, is qemu_send_msi better than qemu_set_irq.

> Either way, this is an implementation detail between pci.c and individual 
> devices. It has nothing to do with the APIC.
> 
> Paul

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>> 
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a 
>> one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().

Actually, it seems we can't do it this way now as stl_phys
only gets a 32 bit address. So I'll use apic_deliver for now,
but yes, it will be easy to later rewrite MSI implementation this way
if that limitatiuon is lifted.


> The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.
>
>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.
>
> -- 
> error compiling committee.c: too many arguments to function

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>> 
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a 
>> one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().  The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.

I like the monetization angle here.

>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.

I'd rather keep it simple.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Avi Kivity wrote:
> > Paul Brook wrote:
> >  In any case we need some internal API for this, and qemu_irq looks
> >  like a good choice.
> > >>>
> > >>> What do you expect to be using this API?
> > >>
> > >> virtio, emulated devices capable of supporting MSI (e1000?), device
> > >> assignment (not yet in qemu.git).
> > >
> > > It probably makes sense to have common infrastructure in pci.c to
> > > expose/implement device side MSI functionality. However I see no need for
> > > a direct API between the device and the APIC. We already have an API for
> > > memory accesses and MMIO regions. I'm pretty sure a system could
> > > implement MSI by pointing the device at system ram, and having the CPU
> > > periodically poll that.
> >
> > Instead of writing directly, let's abstract it behind a qemu_set_irq().
> > This is easier for device authors.  The default implementation of the
> > irq callback could write to apic memory, while for kvm we can directly
> > trigger the interrupt via the kvm APIs.
> 
> I'm still not convinced.
> 
> A tight coupling between PCI devices and the APIC is just going to cause us 
> problems later one. I'm going to come back to the fact that these are memory 
> writes so once we get IOMMU support they will presumably be subject to 
> remapping by that, just like any other memory access.

Actually, MSI messages are handled by IOMMU as interrupts, not as
regular memory accesses. iommu book has comments such as
• Interrupt addresses are never translated to memory addresses, but
other special address ranges may be reclaimed to be backed with memory.

> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
> off event, not a level state.

Yes, I just chose to ignore the level value. It does not look like such
a big issue ... Do you advocate a new qemu_msi structure then?

> OTOH stl_phys is exactly the right interface.

Not really. These are writes but not into physical memory.
For example, on intel 32 bit, stl_phys gets a 32 bit address,
MSI writes encode the interrupt vector in high 32 bit bit of
the address - way outside actual physical memory.

> The KVM interface should be contained within the APIC implementation.
> 
> Paul

Unfortunately kvm capabilities that are present in current kernels
do not map well to this interface. You need to perform
expensive set up for each interrupt vector you are going to use,
be it MSI or regular interrupt.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>> This is easier for device authors.  The default implementation of the
>> irq callback could write to apic memory, while for kvm we can directly
>> trigger the interrupt via the kvm APIs.
>> 
>
> I'm still not convinced.
>
> A tight coupling between PCI devices and the APIC is just going to cause us 
> problems later one. I'm going to come back to the fact that these are memory 
> writes so once we get IOMMU support they will presumably be subject to 
> remapping by that, just like any other memory access.
>   

I'm not suggesting the qemu_irq will extend all the way to the apic.  
Think of it as connecting the device core with its interrupt unit.

> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
> off event, not a level state. OTOH stl_phys is exactly the right interface.
>   

The qemu_irq callback should do an stl_phys().  The device is happy 
since it's using the same API it uses for non-MSI.  The APIC is happy 
since it isn't connected directly to the device.  stl_phys() is happy 
since it sees more traffic and can serve more ads.  kvm is happy since 
it can hijack the callback to throw the interrupt directly into the kernel.

> The KVM interface should be contained within the APIC implementation.
>   

Tricky, but doable.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
> In any case we need some internal API for this, and qemu_irq looks like
> a good choice.
> 
 What do you expect to be using this API?
   
>>> virtio, emulated devices capable of supporting MSI (e1000?), device
>>> assignment (not yet in qemu.git).
>>> 
>>
>> It probably makes sense to have common infrastructure in pci.c to  
>> expose/implement device side MSI functionality. However I see no need 
>> for a direct API between the device and the APIC. We already have an 
>> API for memory accesses and MMIO regions. I'm pretty sure a system 
>> could implement MSI by pointing the device at system ram, and having 
>> the CPU periodically poll that.
>>   
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().   
> This is easier for device authors.  The default implementation of the  
> irq callback could write to apic memory, while for kvm we can directly  
> trigger the interrupt via the kvm APIs.

Right.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 In any case we need some internal API for this, and qemu_irq looks like
 a good choice.
 
>>> What do you expect to be using this API?
>>>   
>> virtio, emulated devices capable of supporting MSI (e1000?), device
>> assignment (not yet in qemu.git).
>> 
>
> It probably makes sense to have common infrastructure in pci.c to 
> expose/implement device side MSI functionality. However I see no need for a 
> direct API between the device and the APIC. We already have an API for memory 
> accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
> pointing the device at system ram, and having the CPU periodically poll that.
>   

Instead of writing directly, let's abstract it behind a qemu_set_irq().  
This is easier for device authors.  The default implementation of the 
irq callback could write to apic memory, while for kvm we can directly 
trigger the interrupt via the kvm APIs.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
>> The PCI bus doesn't need any special support (I think) but something on
>> the other end needs to interpret those writes.
>> 
>
> Sure. But there's definitely nothing PCI specific about it. I assumed this 
> would all be contained within the APIC.
>   

MSIs are defined by PCI and their configuration is done using the PCI 
configuration space.

>> In any case we need some internal API for this, and qemu_irq looks like
>> a good choice.
>> 
>
> What do you expect to be using this API?
>   

virtio, emulated devices capable of supporting MSI (e1000?), device 
assignment (not yet in qemu.git).

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote:
> > The PCI bus doesn't need any special support (I think) but something on
> > the other end needs to interpret those writes.
> 
> Sure. But there's definitely nothing PCI specific about it. I assumed this 
> would all be contained within the APIC.

Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts.
For kvm, we'll have another implementation that passes these requests
on to kernel.

> > In any case we need some internal API for this, and qemu_irq looks like
> > a good choice.
> 
> What do you expect to be using this API?
> 
> Paul

emulated PCI devices such as virtio.
Hope to send a patch shortly.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
> On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
>   
>> define api for allocating/setting up msi-x irqs, and for updating them
>> with msi-x vector information, supply implementation in ioapic. Please
>> comment on this API: I intend to port my msi-x patch to work on top of
>> it.
>> 
>
> I though the point of MSI is that they are just a regular memory writes, and 
> don't require any special bus support.
>   

The PCI bus doesn't need any special support (I think) but something on 
the other end needs to interpret those writes.

In any case we need some internal API for this, and qemu_irq looks like 
a good choice.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> >  > >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > >  > > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > >  > > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > > >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and 
> > for updating them
> >  > >  > > >  > >  > >  with msi-x vector information, supply implementation 
> > in ioapic. Please
> >  > >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch 
> > to work on top of
> >  > >  > > >  > >  > >  it.
> >  > >  > > >  > >  > >
> >  > >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> >  > >  > > >  > >  >
> >  > >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented 
> > yet) for
> >  > >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 
> > bits.
> >  > >  > > >  > >  > I think we should aim for a more generic API that 
> > covers this case also.
> >  > >  > > >  > >
> >  > >  > > >  > >
> >  > >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and 
> > PCI only has
> >  > >  > > >  > >  MSI, not "mondos". What code would benefit from this 
> > abstraction?
> >  > >  > > >  >
> >  > >  > > >  > Sparc64 emulation, of course. I think also the API would be 
> > neater.
> >  > >  > > >
> >  > >  > > >
> >  > >  > > > Since "mondos" are not interrupts, why use irqs for them?
> >  > >  > >
> >  > >  > > I just said above that they are used for interrupt vector data. 
> > What
> >  > >  > > makes you think they are not interrupts?
> >  > >  >
> >  > >  > I'm sorry, I don't really know anything about sparc.
> >  > >  > All I am saying is that in PCI, interrupts never pass data,
> >  > >  > so qemu_set_irq as it is now, is a good API to send them.
> >  > >  >
> >  > >  > For the sparc feature you describe, you probably want to add
> >  > >  > a message data parameter to qemu_set_irq, but it's not
> >  > >  > really useful for MSI.
> >  > >
> >  > >
> >  > > Just to clarify, the main difference is that with MSI/MSI-X
> >  > >  both data and address fields are mostly static, modifying them
> >  > >  involves ioapic and device updates which might be an expensive
> >  > >  operation (e.g. with kvm, needs an extra system call).
> >  > >
> >  > >  So I don't think it makes sense to pass MSI-X data field
> >  > >  with each call to qemu_set_irq.
> >  >
> >  > No, but I think the Sparc situation is the same, the packet data is
> >  > static for the interrupt source in question.
> >
> >
> > So, ok, we could add data update callback and then MSI and sparc
> >  would do their thing there. I'm not convinced I like all this
> >  play with untyped buffers, do you think it's helpful?
> >
> >  If yes, maybe I'll try to code it up and see how does it look.
> 
> Well, get/set_data could be limited to irq.c, ioapic.c could export
> something like get/set_msi_data. MSI callers should only use
> get/set_msi_data. Ditto for get/set_mondo_data.

I started coding it up and got lost in a maze of callbacks and void pointers :)

And I also remembered that besides data, there's a mask and pending bits in msi
(it's ignored in the patch I sent, but I'm fixing that), so there needs
to be an interface for this as well.

I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I
added, you'll find it's easy to implement mondos in very few lines of code.
We can always add another level of indirection later when sparc code
surfaces, if we find it helps.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-21 Thread David Miller
From: Rusty Russell 
Date: Thu, 21 May 2009 16:27:05 +0930

> On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
>> What you're doing by orphan'ing is creating a situation where a single
>> UDP socket can loop doing sends and monopolize the TX queue of a
>> device.  The only control we have over a sender for fairness in
>> datagram protocols is that send buffer allocation.
> 
> Urgh, that hadn't even occurred to me.  Good point.

Now this all is predicated on this actually mattering. :-)

You could argue that the scheduler as well as the size of the
TX queue should be limiting and enforcing fairness.

Someone really needs to test this.  Just skb_orphan() every packet
at the beginning of dev_hard_start_xmit(), then run some test
program with two clients looping out UDP packets to see if one
can monopolize the device and get a significantly larger amount
of TX resources than the other.  Repeat for 3, 4, 5, etc. clients.

> I haven't thought this through properly, but how about a hack where
> we don't orphan packets if the ring is over half full?

That would also work.  And for the NIU case this would be great
because I DO have a marker bit for triggering interrupts in the TX
descriptors.  There's just no "all empty" interrupt on TX (who
designs these things? :( ).

> Then I guess we could overload the watchdog as a more general
> timer-after-no- xmit?

Yes, but it means that teardown of a socket can be delayed up to
the amount of that timer.  Factor in all of this crazy
round_jiffies() stuff people do these days and it could cause
pauses for real use cases and drive users batty.

Probably the most profitable avenue is to see if this is a real issue
afterall (see above).  If we can get away with having the socket
buffer represent socket --> device space only, that's the most ideal
solution.  It will probably also improve performance a lot across the
board, especially on NUMA/SMP boxes as our TX complete events tend to
be in difference places than the SKB producer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-21 Thread Rusty Russell
On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
> From: Rusty Russell 
> Date: Mon, 18 May 2009 22:18:47 +0930
> > We check for finished xmit skbs on every xmit, or on a timer (unless
> > the host promises to force an interrupt when the xmit ring is empty).
> > This can penalize userspace tasks which fill their sockbuf.  Not much
> > difference with TSO, but measurable with large numbers of packets.
> >
> > There are a finite number of packets which can be in the transmission
> > queue.  We could fire the timer more than every 100ms, but that would
> > just hurt performance for a corner case.  This seems neatest.
...
> > Signed-off-by: Rusty Russell 
>
> If this is so great for virtio it would also be a great idea
> universally, but we don't do it.
>
> What you're doing by orphan'ing is creating a situation where a single
> UDP socket can loop doing sends and monopolize the TX queue of a
> device.  The only control we have over a sender for fairness in
> datagram protocols is that send buffer allocation.

Urgh, that hadn't even occurred to me.  Good point.

> I'm guilty of doing this too in the NIU driver, also because there I
> lack a "TX queue empty" interrupt and this can keep TCP sockets from
> getting stuck.
>
> I think we need a generic solution to this issue because it is getting
> quite common to see cases where the packets in the TX queue of a
> device can sit there indefinitely.

I haven't thought this through properly, but how about a hack where we don't 
orphan packets if the ring is over half full?

Then I guess we could overload the watchdog as a more general timer-after-no-
xmit?

Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization