Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-21 Thread Rusty Russell
On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
 From: Rusty Russell ru...@rustcorp.com.au
 Date: Mon, 18 May 2009 22:18:47 +0930
  We check for finished xmit skbs on every xmit, or on a timer (unless
  the host promises to force an interrupt when the xmit ring is empty).
  This can penalize userspace tasks which fill their sockbuf.  Not much
  difference with TSO, but measurable with large numbers of packets.
 
  There are a finite number of packets which can be in the transmission
  queue.  We could fire the timer more than every 100ms, but that would
  just hurt performance for a corner case.  This seems neatest.
...
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au

 If this is so great for virtio it would also be a great idea
 universally, but we don't do it.

 What you're doing by orphan'ing is creating a situation where a single
 UDP socket can loop doing sends and monopolize the TX queue of a
 device.  The only control we have over a sender for fairness in
 datagram protocols is that send buffer allocation.

Urgh, that hadn't even occurred to me.  Good point.

 I'm guilty of doing this too in the NIU driver, also because there I
 lack a TX queue empty interrupt and this can keep TCP sockets from
 getting stuck.

 I think we need a generic solution to this issue because it is getting
 quite common to see cases where the packets in the TX queue of a
 device can sit there indefinitely.

I haven't thought this through properly, but how about a hack where we don't 
orphan packets if the ring is over half full?

Then I guess we could overload the watchdog as a more general timer-after-no-
xmit?

Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-21 Thread David Miller
From: Rusty Russell ru...@rustcorp.com.au
Date: Thu, 21 May 2009 16:27:05 +0930

 On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
 What you're doing by orphan'ing is creating a situation where a single
 UDP socket can loop doing sends and monopolize the TX queue of a
 device.  The only control we have over a sender for fairness in
 datagram protocols is that send buffer allocation.
 
 Urgh, that hadn't even occurred to me.  Good point.

Now this all is predicated on this actually mattering. :-)

You could argue that the scheduler as well as the size of the
TX queue should be limiting and enforcing fairness.

Someone really needs to test this.  Just skb_orphan() every packet
at the beginning of dev_hard_start_xmit(), then run some test
program with two clients looping out UDP packets to see if one
can monopolize the device and get a significantly larger amount
of TX resources than the other.  Repeat for 3, 4, 5, etc. clients.

 I haven't thought this through properly, but how about a hack where
 we don't orphan packets if the ring is over half full?

That would also work.  And for the NIU case this would be great
because I DO have a marker bit for triggering interrupts in the TX
descriptors.  There's just no all empty interrupt on TX (who
designs these things? :( ).

 Then I guess we could overload the watchdog as a more general
 timer-after-no- xmit?

Yes, but it means that teardown of a socket can be delayed up to
the amount of that timer.  Factor in all of this crazy
round_jiffies() stuff people do these days and it could cause
pauses for real use cases and drive users batty.

Probably the most profitable avenue is to see if this is a real issue
afterall (see above).  If we can get away with having the socket
buffer represent socket -- device space only, that's the most ideal
solution.  It will probably also improve performance a lot across the
board, especially on NUMA/SMP boxes as our TX complete events tend to
be in difference places than the SKB producer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote:
 On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote:
  On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
   On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
   On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote:
On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
  On 5/20/09, Michael S. Tsirkin m...@redhat.com wrote:
   define api for allocating/setting up msi-x irqs, and 
  for updating them
with msi-x vector information, supply implementation 
  in ioapic. Please
comment on this API: I intend to port my msi-x patch 
  to work on top of
it.
  
Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
  Sparc64 also uses packets (mondos, not implemented 
  yet) for
  interrupt vector data, there the packet size is 8 * 64 
  bits.
  I think we should aim for a more generic API that 
  covers this case also.
   
   
Are you sure this is a good idea? MSI is tied to PCI, and 
  PCI only has
 MSI, not mondos. What code would benefit from this 
  abstraction?
  
   Sparc64 emulation, of course. I think also the API would be 
  neater.


 Since mondos are not interrupts, why use irqs for them?
   
I just said above that they are used for interrupt vector data. 
  What
makes you think they are not interrupts?
  
   I'm sorry, I don't really know anything about sparc.
   All I am saying is that in PCI, interrupts never pass data,
   so qemu_set_irq as it is now, is a good API to send them.
  
   For the sparc feature you describe, you probably want to add
   a message data parameter to qemu_set_irq, but it's not
   really useful for MSI.


 Just to clarify, the main difference is that with MSI/MSI-X
  both data and address fields are mostly static, modifying them
  involves ioapic and device updates which might be an expensive
  operation (e.g. with kvm, needs an extra system call).

  So I don't think it makes sense to pass MSI-X data field
  with each call to qemu_set_irq.
   
No, but I think the Sparc situation is the same, the packet data is
static for the interrupt source in question.
 
 
  So, ok, we could add data update callback and then MSI and sparc
   would do their thing there. I'm not convinced I like all this
   play with untyped buffers, do you think it's helpful?
 
   If yes, maybe I'll try to code it up and see how does it look.
 
 Well, get/set_data could be limited to irq.c, ioapic.c could export
 something like get/set_msi_data. MSI callers should only use
 get/set_msi_data. Ditto for get/set_mondo_data.

I started coding it up and got lost in a maze of callbacks and void pointers :)

And I also remembered that besides data, there's a mask and pending bits in msi
(it's ignored in the patch I sent, but I'm fixing that), so there needs
to be an interface for this as well.

I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I
added, you'll find it's easy to implement mondos in very few lines of code.
We can always add another level of indirection later when sparc code
surfaces, if we find it helps.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
   
 define api for allocating/setting up msi-x irqs, and for updating them
 with msi-x vector information, supply implementation in ioapic. Please
 comment on this API: I intend to port my msi-x patch to work on top of
 it.
 

 I though the point of MSI is that they are just a regular memory writes, and 
 don't require any special bus support.
   

The PCI bus doesn't need any special support (I think) but something on 
the other end needs to interpret those writes.

In any case we need some internal API for this, and qemu_irq looks like 
a good choice.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote:
  The PCI bus doesn't need any special support (I think) but something on
  the other end needs to interpret those writes.
 
 Sure. But there's definitely nothing PCI specific about it. I assumed this 
 would all be contained within the APIC.

Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts.
For kvm, we'll have another implementation that passes these requests
on to kernel.

  In any case we need some internal API for this, and qemu_irq looks like
  a good choice.
 
 What do you expect to be using this API?
 
 Paul

emulated PCI devices such as virtio.
Hope to send a patch shortly.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 The PCI bus doesn't need any special support (I think) but something on
 the other end needs to interpret those writes.
 

 Sure. But there's definitely nothing PCI specific about it. I assumed this 
 would all be contained within the APIC.
   

MSIs are defined by PCI and their configuration is done using the PCI 
configuration space.

 In any case we need some internal API for this, and qemu_irq looks like
 a good choice.
 

 What do you expect to be using this API?
   

virtio, emulated devices capable of supporting MSI (e1000?), device 
assignment (not yet in qemu.git).

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 In any case we need some internal API for this, and qemu_irq looks like
 a good choice.
 
 What do you expect to be using this API?
   
 virtio, emulated devices capable of supporting MSI (e1000?), device
 assignment (not yet in qemu.git).
 

 It probably makes sense to have common infrastructure in pci.c to 
 expose/implement device side MSI functionality. However I see no need for a 
 direct API between the device and the APIC. We already have an API for memory 
 accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
 pointing the device at system ram, and having the CPU periodically poll that.
   

Instead of writing directly, let's abstract it behind a qemu_set_irq().  
This is easier for device authors.  The default implementation of the 
irq callback could write to apic memory, while for kvm we can directly 
trigger the interrupt via the kvm APIs.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote:
 Paul Brook wrote:
 In any case we need some internal API for this, and qemu_irq looks like
 a good choice.
 
 What do you expect to be using this API?
   
 virtio, emulated devices capable of supporting MSI (e1000?), device
 assignment (not yet in qemu.git).
 

 It probably makes sense to have common infrastructure in pci.c to  
 expose/implement device side MSI functionality. However I see no need 
 for a direct API between the device and the APIC. We already have an 
 API for memory accesses and MMIO regions. I'm pretty sure a system 
 could implement MSI by pointing the device at system ram, and having 
 the CPU periodically poll that.
   

 Instead of writing directly, let's abstract it behind a qemu_set_irq().   
 This is easier for device authors.  The default implementation of the  
 irq callback could write to apic memory, while for kvm we can directly  
 trigger the interrupt via the kvm APIs.

Right.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 Instead of writing directly, let's abstract it behind a qemu_set_irq().
 This is easier for device authors.  The default implementation of the
 irq callback could write to apic memory, while for kvm we can directly
 trigger the interrupt via the kvm APIs.
 

 I'm still not convinced.

 A tight coupling between PCI devices and the APIC is just going to cause us 
 problems later one. I'm going to come back to the fact that these are memory 
 writes so once we get IOMMU support they will presumably be subject to 
 remapping by that, just like any other memory access.
   

I'm not suggesting the qemu_irq will extend all the way to the apic.  
Think of it as connecting the device core with its interrupt unit.

 Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
 off event, not a level state. OTOH stl_phys is exactly the right interface.
   

The qemu_irq callback should do an stl_phys().  The device is happy 
since it's using the same API it uses for non-MSI.  The APIC is happy 
since it isn't connected directly to the device.  stl_phys() is happy 
since it sees more traffic and can serve more ads.  kvm is happy since 
it can hijack the callback to throw the interrupt directly into the kernel.

 The KVM interface should be contained within the APIC implementation.
   

Tricky, but doable.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote:
 On Thursday 21 May 2009, Avi Kivity wrote:
  Paul Brook wrote:
   In any case we need some internal API for this, and qemu_irq looks
   like a good choice.
  
   What do you expect to be using this API?
  
   virtio, emulated devices capable of supporting MSI (e1000?), device
   assignment (not yet in qemu.git).
  
   It probably makes sense to have common infrastructure in pci.c to
   expose/implement device side MSI functionality. However I see no need for
   a direct API between the device and the APIC. We already have an API for
   memory accesses and MMIO regions. I'm pretty sure a system could
   implement MSI by pointing the device at system ram, and having the CPU
   periodically poll that.
 
  Instead of writing directly, let's abstract it behind a qemu_set_irq().
  This is easier for device authors.  The default implementation of the
  irq callback could write to apic memory, while for kvm we can directly
  trigger the interrupt via the kvm APIs.
 
 I'm still not convinced.
 
 A tight coupling between PCI devices and the APIC is just going to cause us 
 problems later one. I'm going to come back to the fact that these are memory 
 writes so once we get IOMMU support they will presumably be subject to 
 remapping by that, just like any other memory access.

Actually, MSI messages are handled by IOMMU as interrupts, not as
regular memory accesses. iommu book has comments such as
• Interrupt addresses are never translated to memory addresses, but
other special address ranges may be reclaimed to be backed with memory.

 Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
 off event, not a level state.

Yes, I just chose to ignore the level value. It does not look like such
a big issue ... Do you advocate a new qemu_msi structure then?

 OTOH stl_phys is exactly the right interface.

Not really. These are writes but not into physical memory.
For example, on intel 32 bit, stl_phys gets a 32 bit address,
MSI writes encode the interrupt vector in high 32 bit bit of
the address - way outside actual physical memory.

 The KVM interface should be contained within the APIC implementation.
 
 Paul

Unfortunately kvm capabilities that are present in current kernels
do not map well to this interface. You need to perform
expensive set up for each interrupt vector you are going to use,
be it MSI or regular interrupt.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
 Paul Brook wrote:
 Instead of writing directly, let's abstract it behind a qemu_set_irq().
 This is easier for device authors.  The default implementation of the
 irq callback could write to apic memory, while for kvm we can directly
 trigger the interrupt via the kvm APIs.
 

 I'm still not convinced.

 A tight coupling between PCI devices and the APIC is just going to 
 cause us problems later one. I'm going to come back to the fact that 
 these are memory writes so once we get IOMMU support they will 
 presumably be subject to remapping by that, just like any other memory 
 access.
   

 I'm not suggesting the qemu_irq will extend all the way to the apic.   
 Think of it as connecting the device core with its interrupt unit.

 Even ignoring that, qemu_irq isn't really the right interface. A MSI is a 
 one-
 off event, not a level state. OTOH stl_phys is exactly the right interface.
   

 The qemu_irq callback should do an stl_phys().  The device is happy  
 since it's using the same API it uses for non-MSI.  The APIC is happy  
 since it isn't connected directly to the device.  stl_phys() is happy  
 since it sees more traffic and can serve more ads.  kvm is happy since  
 it can hijack the callback to throw the interrupt directly into the 
 kernel.

I like the monetization angle here.

 The KVM interface should be contained within the APIC implementation.
   

 Tricky, but doable.

I'd rather keep it simple.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
   MSI provides multiple edge triggered interrupts, whereas traditional mode
   provides a single level triggered interrupt. My guess is most devices
   will want to treat these differently anyway.
 
  So, is qemu_send_msi better than qemu_set_irq.
 
 Neither. pci_send_msi,

Works for me.

 which is a trivial wrapper around stl_phys.

OK, but I'm adding another level of indirection in the middle,
to allow us to tie in a kvm backend.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 which is a trivial wrapper around stl_phys.
   
 OK, but I'm adding another level of indirection in the middle,
 to allow us to tie in a kvm backend.
 

 kvm has no business messing with the PCI device code.
   

kvm has a fast path for irq injection.  If qemu wants to support it we 
need some abstraction here.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote:
   which is a trivial wrapper around stl_phys.
 
  OK, but I'm adding another level of indirection in the middle,
  to allow us to tie in a kvm backend.
 
 kvm has no business messing with the PCI device code.

Yes it has :)

kvm needs data on MSI entries: that's the interface
current kernel exposes for injecting these interrupts.

I think we also need to support in-kernel devices which
would inject MSI interrupt directly from kernel.
For these, kvm would need to know when mask bit changes
and give us info on pending bit.

That's a fair amount of PCI specific code in kvm.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
 define api for allocating/setting up msi-x irqs, and for updating them
 with msi-x vector information, supply implementation in ioapic. Please
 comment on this API: I intend to port my msi-x patch to work on top of
 it.

I though the point of MSI is that they are just a regular memory writes, and 
don't require any special bus support.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
 The PCI bus doesn't need any special support (I think) but something on
 the other end needs to interpret those writes.

Sure. But there's definitely nothing PCI specific about it. I assumed this 
would all be contained within the APIC.

 In any case we need some internal API for this, and qemu_irq looks like
 a good choice.

What do you expect to be using this API?

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
 Paul Brook wrote:
  In any case we need some internal API for this, and qemu_irq looks
  like a good choice.
 
  What do you expect to be using this API?
 
  virtio, emulated devices capable of supporting MSI (e1000?), device
  assignment (not yet in qemu.git).
 
  It probably makes sense to have common infrastructure in pci.c to
  expose/implement device side MSI functionality. However I see no need for
  a direct API between the device and the APIC. We already have an API for
  memory accesses and MMIO regions. I'm pretty sure a system could
  implement MSI by pointing the device at system ram, and having the CPU
  periodically poll that.

 Instead of writing directly, let's abstract it behind a qemu_set_irq().
 This is easier for device authors.  The default implementation of the
 irq callback could write to apic memory, while for kvm we can directly
 trigger the interrupt via the kvm APIs.

I'm still not convinced.

A tight coupling between PCI devices and the APIC is just going to cause us 
problems later one. I'm going to come back to the fact that these are memory 
writes so once we get IOMMU support they will presumably be subject to 
remapping by that, just like any other memory access.

Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
off event, not a level state. OTOH stl_phys is exactly the right interface.

The KVM interface should be contained within the APIC implementation.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
  A tight coupling between PCI devices and the APIC is just going to cause
  us problems later one. I'm going to come back to the fact that these are
  memory writes so once we get IOMMU support they will presumably be
  subject to remapping by that, just like any other memory access.

 I'm not suggesting the qemu_irq will extend all the way to the apic.
 Think of it as connecting the device core with its interrupt unit.

  Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
  one- off event, not a level state. OTOH stl_phys is exactly the right
  interface.

 The qemu_irq callback should do an stl_phys().  The device is happy
 since it's using the same API it uses for non-MSI. 

MSI provides multiple edge triggered interrupts, whereas traditional mode 
provides a single level triggered interrupt. My guess is most devices will 
want to treat these differently anyway.

Either way, this is an implementation detail between pci.c and individual 
devices. It has nothing to do with the APIC.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
  MSI provides multiple edge triggered interrupts, whereas traditional mode
  provides a single level triggered interrupt. My guess is most devices
  will want to treat these differently anyway.

 So, is qemu_send_msi better than qemu_set_irq.

Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Paul Brook wrote:
   MSI provides multiple edge triggered interrupts, whereas traditional
   mode provides a single level triggered interrupt. My guess is most
   devices will want to treat these differently anyway.
 
  So, is qemu_send_msi better than qemu_set_irq.

 Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

To clarify, you seem to be trying to fuse two largely separate features 
together.

MSI is a standard PCI device capability[1] that involves the device performing 
a 32-bit memory write when something interesting occurs. These writes may or 
may not be directed at a APIC.

The x86 APIC has a memory mapped interface that allows generation of CPU 
interrupts in response response to memory writes. These may or may not come 
from an MSI capable PCI device.

Paul

[1] Note a *device* capability, not a bus capability.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
  which is a trivial wrapper around stl_phys.

 OK, but I'm adding another level of indirection in the middle,
 to allow us to tie in a kvm backend.

kvm has no business messing with the PCI device code.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
 Paul Brook wrote:
  which is a trivial wrapper around stl_phys.
 
  OK, but I'm adding another level of indirection in the middle,
  to allow us to tie in a kvm backend.
 
  kvm has no business messing with the PCI device code.

 kvm has a fast path for irq injection.  If qemu wants to support it we
 need some abstraction here.

Fast path from where to where? Having the PCI layer bypass/re-implement the 
APIC and inject the interrupt directly into the cpu core sounds a particularly 
bad idea.

Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 On Thursday 21 May 2009, Avi Kivity wrote:
   
 Paul Brook wrote:
 
 which is a trivial wrapper around stl_phys.
   
 OK, but I'm adding another level of indirection in the middle,
 to allow us to tie in a kvm backend.
 
 kvm has no business messing with the PCI device code.
   
 kvm has a fast path for irq injection.  If qemu wants to support it we
 need some abstraction here.
 

 Fast path from where to where? Having the PCI layer bypass/re-implement the 
 APIC and inject the interrupt directly into the cpu core sounds a 
 particularly 
 bad idea.
   

kvm implements the APIC in the host kernel (qemu upstream doesn't 
support this yet).  The fast path is wired to the in-kernel APIC, not 
the cpu core directly.

The idea is to wire it to UIO for device assignment, to a virtio-device 
implemented in the kernel, and to qemu.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Avi Kivity
Paul Brook wrote:
 kvm implements the APIC in the host kernel (qemu upstream doesn't
 support this yet).  The fast path is wired to the in-kernel APIC, not
 the cpu core directly.

 The idea is to wire it to UIO for device assignment, to a virtio-device
 implemented in the kernel, and to qemu.
 

 I still don't see why you're trying to bypass straight from the pci layer to 
 the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
 presumably got to update the apic state anyway.
   

The fast path is an eventfd so that we don't have to teach all the 
clients about the details of MSI.  Userspace programs the MSI details 
into kvm and hands the client an eventfd.  All the client has to do is 
bang on the eventfd for the interrupt to be queued.  The eventfd 
provides event coalescing and is equally useful from the kernel and 
userspace, and can be used with targets other than kvm.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
 Paul Brook wrote:
  kvm implements the APIC in the host kernel (qemu upstream doesn't
  support this yet).  The fast path is wired to the in-kernel APIC, not
  the cpu core directly.
 
  The idea is to wire it to UIO for device assignment, to a virtio-device
  implemented in the kernel, and to qemu.
 
  I still don't see why you're trying to bypass straight from the pci layer
  to the apic. Why can't you just pass the apic MMIO writes to the kernel?
  You've presumably got to update the apic state anyway.

 The fast path is an eventfd so that we don't have to teach all the
 clients about the details of MSI.  Userspace programs the MSI details
 into kvm and hands the client an eventfd.  All the client has to do is
 bang on the eventfd for the interrupt to be queued.  The eventfd
 provides event coalescing and is equally useful from the kernel and
 userspace, and can be used with targets other than kvm.

So presumably if a device triggers an APIC interrupt using a write that isn't 
one of the currently configured PCI devices, it all explodes horribly?

Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote:
   kvm has no business messing with the PCI device code.
  
   kvm has a fast path for irq injection.  If qemu wants to support it we
   need some abstraction here.
  
   Fast path from where to where? Having the PCI layer bypass/re-implement
   the APIC and inject the interrupt directly into the cpu core sounds a
   particularly bad idea.
 
  kvm implements the APIC in the host kernel (qemu upstream doesn't
  support this yet).  The fast path is wired to the in-kernel APIC, not
  the cpu core directly.
 
  The idea is to wire it to UIO for device assignment, to a virtio-device
  implemented in the kernel, and to qemu.
 
 I still don't see why you're trying to bypass straight from the pci layer to 
 the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
 presumably got to update the apic state anyway.
 
 Paul

As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes.
They are PCI memory writes to a hard-coded address range that
are passed to APIC. I don't think MMIO writes can triger MSI,
or at least this does not seem to be documented.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
 On Thursday 21 May 2009, Paul Brook wrote:
MSI provides multiple edge triggered interrupts, whereas traditional
mode provides a single level triggered interrupt. My guess is most
devices will want to treat these differently anyway.
  
   So, is qemu_send_msi better than qemu_set_irq.
 
  Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
 
 To clarify, you seem to be trying to fuse two largely separate features 
 together.
 
 MSI is a standard PCI device capability[1] that involves the device 
 performing 
 a 32-bit memory write when something interesting occurs. These writes may or 
 may not be directed at a APIC.
 
 The x86 APIC has a memory mapped interface that allows generation of CPU 
 interrupts in response response to memory writes. These may or may not come 
 from an MSI capable PCI device.
 
 Paul
 
 [1] Note a *device* capability, not a bus capability.

Paul, so I went over specs, and what you say about APIC here does not
seem to be what Intel actually implemented.  Specifically, Intel
implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
signal interrupts by memory writes.

For example, after reset, when CPU writes to address 0xfee0 this
is an access to a reserved register in APIC, but when PCI device
does write to 0xfee0, this triggers an interrupt to destination 0.

See section 9.12 in Intel® 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A: System Programming Guide, Part 1
http://www.intel.com/Assets/PDF/manual/253668.pdf

So it seems that what we need to do in pci is:

if (!msi_ops || msi_ops-send_msi(address, data))
stl_phy(address, data);

where send_msi is wired to apic_send_msi and
where apic_send_msi returns an error for an address
outside of the MSI range 0xfee0 - 0xfeef

Makes sense?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote:
 On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
  On Thursday 21 May 2009, Paul Brook wrote:
 MSI provides multiple edge triggered interrupts, whereas traditional
 mode provides a single level triggered interrupt. My guess is most
 devices will want to treat these differently anyway.
   
So, is qemu_send_msi better than qemu_set_irq.
  
   Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
  
  To clarify, you seem to be trying to fuse two largely separate features 
  together.
  
  MSI is a standard PCI device capability[1] that involves the device 
  performing 
  a 32-bit memory write when something interesting occurs. These writes may 
  or 
  may not be directed at a APIC.
  
  The x86 APIC has a memory mapped interface that allows generation of CPU 
  interrupts in response response to memory writes. These may or may not come 
  from an MSI capable PCI device.
  
  Paul
  
  [1] Note a *device* capability, not a bus capability.
 
 Paul, so I went over specs, and what you say about APIC here does not
 seem to be what Intel actually implemented.  Specifically, Intel
 implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
 signal interrupts by memory writes.
 
 For example, after reset, when CPU writes to address 0xfee0 this
 is an access to a reserved register in APIC, but when PCI device
 does write to 0xfee0, this triggers an interrupt to destination 0.
 
 See section 9.12 in Intel® 64 and IA-32 Architectures Software
 Developer’s Manual Volume 3A: System Programming Guide, Part 1
 http://www.intel.com/Assets/PDF/manual/253668.pdf
 
 So it seems that what we need to do in pci is:
 
 if (!msi_ops || msi_ops-send_msi(address, data))
   stl_phy(address, data);
 
 where send_msi is wired to apic_send_msi and
 where apic_send_msi returns an error for an address
 outside of the MSI range 0xfee0 - 0xfeef
 
 Makes sense?


So I ended up with these ops:
allocate
free
update
send
which APIC will define and MSI emulation
will use. Here, send will return error for addresses
outside 0xfeex range, and device will do a plain
stl_phy.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] xen/blkfront: beyond ARRAY_SIZE of info-shadow

2009-05-21 Thread Jeremy Fitzhardinge
Roel Kluin wrote:
 Do not go beyond ARRAY_SIZE of info-shadow

 Signed-off-by: Roel Kluin roel.kl...@gmail.com
   
Acked-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Jens, can you put this into a next-merge-window branch?

Thanks,
J
 ---
 diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
 index a6cbf7b..d395986 100644
 --- a/drivers/block/xen-blkfront.c
 +++ b/drivers/block/xen-blkfront.c
 @@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock);
  static int get_id_from_freelist(struct blkfront_info *info)
  {
   unsigned long free = info-shadow_free;
 - BUG_ON(free  BLK_RING_SIZE);
 + BUG_ON(free = BLK_RING_SIZE);
   info-shadow_free = info-shadow[free].req.id;
   info-shadow[free].req.id = 0x0fee; /* debug */
   return free;

 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/virtualization

   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] xen/blkfront: beyond ARRAY_SIZE of info-shadow

2009-05-21 Thread Roel Kluin
Do not go beyond ARRAY_SIZE of info-shadow

Signed-off-by: Roel Kluin roel.kl...@gmail.com
---
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a6cbf7b..d395986 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -122,7 +122,7 @@ static DEFINE_SPINLOCK(blkif_io_lock);
 static int get_id_from_freelist(struct blkfront_info *info)
 {
unsigned long free = info-shadow_free;
-   BUG_ON(free  BLK_RING_SIZE);
+   BUG_ON(free = BLK_RING_SIZE);
info-shadow_free = info-shadow[free].req.id;
info-shadow[free].req.id = 0x0fee; /* debug */
return free;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization