Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka
On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

Maybe, but I'm also thinking of fully emulated devices.

 Do you expect this optimization to give
 a significant performance gain?

Hard to asses in general. But I have a silly guest here that obviously
masks MSI vectors for each event. This currently not only kicks us into
a heavy-weight exit, it also enforces serialization on qemu_global_mutex
(while we have the rest already isolated).

 
 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

With an in-kernel MSI-X MMIO handler, this race will be naturally
unavoidable as there is no more global lock shared between table/PBA
accesses and the device model. But, when using atomic bit ops, I don't
think that will cause headache.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Avi Kivity
On 10/24/2011 02:06 PM, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
  
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *) So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.

Good point.


 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

Right.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin
On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was set
   - devices clear PBA bit if message reason is resolved before unmask (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
 Maybe, but I'm also thinking of fully emulated devices.

One thing seems certain: actual, assigned devices don't
have this fake msi-x level so they don't notify host
when that changes.

  Do you expect this optimization to give
  a significant performance gain?
 
 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).

It easy to see how MSIX mask support in kernel would help.
Not sure whether it's worth it to also add special APIs to
reduce the number of spurious interrupts for such silly guests.

  
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.
 
 Jan

This is not the race I meant.  The challenge is for the device to
determine that it can clear the PBA.  atomic accesses on PBA won't help
here I think.


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka
On 2011-10-25 13:20, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

 Maybe, but I'm also thinking of fully emulated devices.
 
 One thing seems certain: actual, assigned devices don't
 have this fake msi-x level so they don't notify host
 when that changes.

But they have real PBA. We just need to replicate the emulated vector
mask state into real hw. Doesn't this happen anyway when we disable the
IRQ on the host?

If not, that may require a bit more work, maybe a special masking mode
that can be requested by the managing backend of an assigned device from
the MSI-X in-kernel service.

 
 Do you expect this optimization to give
 a significant performance gain?

 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).
 
 It easy to see how MSIX mask support in kernel would help.
 Not sure whether it's worth it to also add special APIs to
 reduce the number of spurious interrupts for such silly guests.

I do not get the latter point. What could be simplified (without making
it incorrect) when ignoring excessive mask accesses? Also, if sane
guests do not access the mask that frequently, why was in-kernel MSI-X
MMIO proposed at all?

 

 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.

 Jan
 
 This is not the race I meant.  The challenge is for the device to
 determine that it can clear the PBA.  atomic accesses on PBA won't help
 here I think.

The device knows best if the interrupt reason persists. It can
synchronize MSI assertion and PBA bit clearance. If it clears too
late, than this reflects what may happen on real hw as well when host
and device race for changing vector mask vs. device state. It's not
stated that those changes need to be serialized inside the device, is it?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin
On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
 On 2011-10-25 13:20, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
  On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was 
  set
   - devices clear PBA bit if message reason is resolved before unmask 
  (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
  Maybe, but I'm also thinking of fully emulated devices.
  
  One thing seems certain: actual, assigned devices don't
  have this fake msi-x level so they don't notify host
  when that changes.
 
 But they have real PBA. We just need to replicate the emulated vector
 mask state into real hw. Doesn't this happen anyway when we disable the
 IRQ on the host?

Not immediately I think.

 If not, that may require a bit more work, maybe a special masking mode
 that can be requested by the managing backend of an assigned device from
 the MSI-X in-kernel service.

True. OTOH this might have cost (extra mmio) for the
doubtful benefit of making PBA values exact.

  
  Do you expect this optimization to give
  a significant performance gain?
 
  Hard to asses in general. But I have a silly guest here that obviously
  masks MSI vectors for each event. This currently not only kicks us into
  a heavy-weight exit, it also enforces serialization on qemu_global_mutex
  (while we have the rest already isolated).
  
  It easy to see how MSIX mask support in kernel would help.
  Not sure whether it's worth it to also add special APIs to
  reduce the number of spurious interrupts for such silly guests.
 
 I do not get the latter point. What could be simplified (without making
 it incorrect) when ignoring excessive mask accesses?

Clearing PBA when we detect an empty ring in host is not required,
IMO. It's an optimization.

 Also, if sane
 guests do not access the mask that frequently, why was in-kernel MSI-X
 MMIO proposed at all?

Apparently whether mask accesses happen a lot depends on the workload.

  
 
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
  With an in-kernel MSI-X MMIO handler, this race will be naturally
  unavoidable as there is no more global lock shared between table/PBA
  accesses and the device model. But, when using atomic bit ops, I don't
  think that will cause headache.
 
  Jan
  
  This is not the race I meant.  The challenge is for the device to
  determine that it can clear the PBA.  atomic accesses on PBA won't help
  here I think.
 
 The device knows best if the interrupt reason persists.

It might not know this unless notified by driver.
E.g. virtio drivers currently don't do interrupt status
reads.

 It can
 synchronize MSI assertion and PBA bit clearance. If it clears too
 late, than this reflects what may happen on real hw as well when host
 and device race for changing vector mask vs. device state. It's not
 stated that those changes need to be serialized inside the device, is it?
 
 Jan

Talking about emulated devices?  It's not sure that real
hardware clears PBA. Considering that no guests I know of use PBA ATM,
I would not be surprised if many devices had broken PBA support.


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka
On 2011-10-25 14:05, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
 On 2011-10-25 13:20, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was 
 set
  - devices clear PBA bit if message reason is resolved before unmask 
 (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

 Maybe, but I'm also thinking of fully emulated devices.

 One thing seems certain: actual, assigned devices don't
 have this fake msi-x level so they don't notify host
 when that changes.

 But they have real PBA. We just need to replicate the emulated vector
 mask state into real hw. Doesn't this happen anyway when we disable the
 IRQ on the host?
 
 Not immediately I think.
 
 If not, that may require a bit more work, maybe a special masking mode
 that can be requested by the managing backend of an assigned device from
 the MSI-X in-kernel service.
 
 True. OTOH this might have cost (extra mmio) for the
 doubtful benefit of making PBA values exact.

I think correctness come before performance unless the latter hurts
significantly.

 

 Do you expect this optimization to give
 a significant performance gain?

 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).

 It easy to see how MSIX mask support in kernel would help.
 Not sure whether it's worth it to also add special APIs to
 reduce the number of spurious interrupts for such silly guests.

 I do not get the latter point. What could be simplified (without making
 it incorrect) when ignoring excessive mask accesses?
 
 Clearing PBA when we detect an empty ring in host is not required,
 IMO. It's an optimization.

For virtio that might be true - as we are free to define the device
behaviour to our benefit. What emulated real devices do is another thing.

 
 Also, if sane
 guests do not access the mask that frequently, why was in-kernel MSI-X
 MMIO proposed at all?
 
 Apparently whether mask accesses happen a lot depends on the workload.
 


 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.

 Jan

 This is not the race I meant.  The challenge is for the device to
 determine that it can clear the PBA.  atomic accesses on PBA won't help
 here I think.

 The device knows best if the interrupt reason persists.
 
 It might not know this unless notified by driver.
 E.g. virtio drivers currently don't do interrupt status
 reads.

Talking about real devices, they obviously know as they maintain the
hardware state.

 
 It can
 synchronize MSI assertion and PBA bit clearance. If it clears too
 late, than this reflects what may happen on real hw as well when host
 and device race for changing vector mask vs. device state. It's not
 stated that those changes need to be serialized inside the device, is it?

 Jan
 
 Talking about emulated devices?  It's not sure that real
 hardware clears PBA. Considering that no guests I know of use PBA ATM,
 I would not be surprised if many devices had broken PBA support.

OK, if there are no conforming MSI-X devices out there, then we can
forget about all the PBA maintenance beyond set if message hit mask,
cleared again on unmask. But I doubt that this is generally true.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin
On Tue, Oct 25, 2011 at 02:21:01PM +0200, Jan Kiszka wrote:
 On 2011-10-25 14:05, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
  On 2011-10-25 13:20, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
  On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was 
  set
   - devices clear PBA bit if message reason is resolved before 
  unmask (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending 
  bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
  Maybe, but I'm also thinking of fully emulated devices.
 
  One thing seems certain: actual, assigned devices don't
  have this fake msi-x level so they don't notify host
  when that changes.
 
  But they have real PBA. We just need to replicate the emulated vector
  mask state into real hw. Doesn't this happen anyway when we disable the
  IRQ on the host?
  
  Not immediately I think.
  
  If not, that may require a bit more work, maybe a special masking mode
  that can be requested by the managing backend of an assigned device from
  the MSI-X in-kernel service.
  
  True. OTOH this might have cost (extra mmio) for the
  doubtful benefit of making PBA values exact.
 
 I think correctness come before performance unless the latter hurts
 significantly.
 
  
 
  Do you expect this optimization to give
  a significant performance gain?
 
  Hard to asses in general. But I have a silly guest here that obviously
  masks MSI vectors for each event. This currently not only kicks us into
  a heavy-weight exit, it also enforces serialization on qemu_global_mutex
  (while we have the rest already isolated).
 
  It easy to see how MSIX mask support in kernel would help.
  Not sure whether it's worth it to also add special APIs to
  reduce the number of spurious interrupts for such silly guests.
 
  I do not get the latter point. What could be simplified (without making
  it incorrect) when ignoring excessive mask accesses?
  
  Clearing PBA when we detect an empty ring in host is not required,
  IMO. It's an optimization.
 
 For virtio that might be true - as we are free to define the device
 behaviour to our benefit. What emulated real devices do is another thing.

Anything specific in mind?

  
  Also, if sane
  guests do not access the mask that frequently, why was in-kernel MSI-X
  MMIO proposed at all?
  
  Apparently whether mask accesses happen a lot depends on the workload.
  
 
 
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
  With an in-kernel MSI-X MMIO handler, this race will be naturally
  unavoidable as there is no more global lock shared between table/PBA
  accesses and the device model. But, when using atomic bit ops, I don't
  think that will cause headache.
 
  Jan
 
  This is not the race I meant.  The challenge is for the device to
  determine that it can clear the PBA.  atomic accesses on PBA won't help
  here I think.
 
  The device knows best if the interrupt reason persists.
  
  It might not know this unless notified by driver.
  E.g. virtio drivers currently don't do interrupt status
  reads.
 
 Talking about real devices, they obviously know as they maintain the
 hardware state.

Not necessarily. It's quite common to keep the ring in coherent memory
allocated by driver, not within the device, the state is then
maintained by driver and device together.

  
  It can
  synchronize MSI assertion and PBA bit clearance. If it clears too
  late, than this reflects what may happen on real hw as well when host
  and device race for changing vector mask vs. device state. It's not
  stated that those changes need to be serialized inside the device, is it?
 
  Jan
  
  Talking about emulated devices?  It's not sure that real
  hardware clears PBA. Considering that no guests I know of use PBA ATM,
  I would not be surprised if many devices had broken PBA support.
 
 OK, if there are no conforming MSI-X devices out there,

Oh, I'm guessing some devices are conforming :)

 then we can
 forget about all the PBA maintenance beyond set if message hit mask,
 cleared again on unmask. But I doubt that this is generally true.
 
 Jan

We seem to get by basically with what you describe but I'm not
saying it's perfect, just that it's hard to make it perfect.

 -- 
 

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Avi Kivity
On 10/21/2011 11:19 AM, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.

By itself, this does not provide enough value to offset the cost of a
new ABI, especially as userspace will need to continue supporting the
old method for a very long while.

 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.


With the new feature it may be worthwhile, but I'd like to see the whole
thing, with numbers attached.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 11:45, Avi Kivity wrote:
 On 10/21/2011 11:19 AM, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.
 
 By itself, this does not provide enough value to offset the cost of a
 new ABI, especially as userspace will need to continue supporting the
 old method for a very long while.

Yes, but less sophistically as it would now.

 
 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.

 
 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

It's not a performance issue, it's a resource limitation issue: With the
new API we can stop worrying about user space device models consuming
limited IRQ routes of the KVM subsystem.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Avi Kivity
On 10/24/2011 12:19 PM, Jan Kiszka wrote:
  
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


Only if those devices are in the same process (or have access to the
vmfd).  Interrupt routing together with irqfd allows you to disaggregate
the device model.  Instead of providing a competing implementation with
new limitations, we need to remove the limitations of the old
implementation.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.

 
 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

That depends on where we do the cut. Currently we let the IRQ source
signal an abstract edge on a pre-allocated pseudo IRQ line. But we
cannot build correct MSI-X on top of the current irqfd model as we lack
the level information (for PBA emulation). *) So we either need to
extend the existing model anyway -- or push per-vector masking back to
the IRQ source. In the latter case, it would be a very good chance to
give up on limited pseudo GSIs with static routes and do MSI messaging
from external IRQ sources to KVM directly.

But all those considerations affect different APIs than what I'm
proposing here. We will always need a way to inject MSIs in the context
of the VM as there will always be scenarios where devices are better run
in that very same context, for performance or simplicity or whatever
reasons. E.g., I could imagine that one would like to execute an
emulated IRQ remapper rather in the hypervisor context than
over-microkernelized in a separate process.

Jan

*) Realized this while trying to generalize the proposed MSI-X MMIO
acceleration for assigned devices to arbitrary device models, vhost-net,
and specifically vfio.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
  
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


I don't agree here. IMO PBA emulation would need to
clear pending bits on interrupt status register read.
So clearing pending bits could be done by ioctl from qemu
while setting them would be done from irqfd.

 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.
 
 Jan
 
 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

I'm actually working on a qemu patch to get pba emulation working correctly.
I think it's doable with existing irqfd.

 and specifically vfio.

Interesting. How would you clear the pseudo interrupt level?

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)
 
 
 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.

How should QEMU know if the reason for pending has been cleared at
device level if the device is outside the scope of QEMU? This model only
works for PV devices when you agree that spurious IRQs are OK.

 
 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,
 
 I'm actually working on a qemu patch to get pba emulation working correctly.
 I think it's doable with existing irqfd.

irqfd has no notion of level. You can only communicate a rising edge and
then need a side channel for the state of the edge reason.

 
 and specifically vfio.
 
 Interesting. How would you clear the pseudo interrupt level?

Ideally: not at all (for MSI). If we manage the mask at device level, we
only need to send the message if there is actually something to deliver
to the interrupt controller and masked input events would be lost on
real HW as well.

That said, we still need to address the irqfd level topic for the finite
amount of legacy interrupt lines. If a line is masked at an IRQ
controller, the device need to keep the controller up to date /wrt to
the line state, or the controller has to poll the current state on
unmask to avoid spurious injections.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 15:11, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.
 
 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.
 

 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

 I'm actually working on a qemu patch to get pba emulation working correctly.
 I think it's doable with existing irqfd.
 
 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.
 

 and specifically vfio.

 Interesting. How would you clear the pseudo interrupt level?
 
 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

This wouldn't work out nicely as well. We rather need a combined model:

Devices need to maintain the PBA actively, i.e. set  clear them
themselves and do not rely on the core here (with the core being either
QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
checks the PBA if it is about to deliver some message and refrains from
doing so if the bit became 0 in the meantime (specifically during the
masked period). For QEMU device models, that means no additional IOCTLs,
just memory sharing of the PBA which is required anyway.

But that means QEMU-external device models need to gain at least basic
MSI-X knowledge. And if they gain this awareness, they could also use it
to send full-blown messages directly (e.g. device-id/vector tuples)
instead of encoding them into finite GSI numbers. But that's an add-on
topic.

Moreover, we still need a corresponding side channel for line-base
interrupts.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 03:11:25PM +0200, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
  
  
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
 
 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.

A read or irq status clears pending in the same way it clears
irq line for level.  I don't think this generates spurious irqs. Yes it
only works for PV.

For assigned devices, the only way I see to implement PBA
correctly is by masking the vector in the device
and looking at the actual pending bit.

  
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
  
  I'm actually working on a qemu patch to get pba emulation working correctly.
  I think it's doable with existing irqfd.
 
 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.

True. But we only need that for PBA read which is unused ATM.
So kvm can just send the read to userspace, have qemu query
vfio or whatever.

  
  and specifically vfio.
  
  Interesting. How would you clear the pseudo interrupt level?
 
 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

Not sure I understand. we certainly shouldn't send masked
interrupts to the APIC if for no other reason that
the message value is invalid while masked.

 That said, we still need to address the irqfd level topic for the finite
 amount of legacy interrupt lines. If a line is masked at an IRQ
 controller, the device need to keep the controller up to date /wrt to
 the line state, or the controller has to poll the current state on
 unmask to avoid spurious injections.
 
 Jan

Yes, level interrupts are tricky.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
 On 2011-10-24 15:11, Jan Kiszka wrote:
  On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the 
  whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
 
 
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
  
  How should QEMU know if the reason for pending has been cleared at
  device level if the device is outside the scope of QEMU? This model only
  works for PV devices when you agree that spurious IRQs are OK.
  
 
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
 
  I'm actually working on a qemu patch to get pba emulation working 
  correctly.
  I think it's doable with existing irqfd.
  
  irqfd has no notion of level. You can only communicate a rising edge and
  then need a side channel for the state of the edge reason.
  
 
  and specifically vfio.
 
  Interesting. How would you clear the pseudo interrupt level?
  
  Ideally: not at all (for MSI). If we manage the mask at device level, we
  only need to send the message if there is actually something to deliver
  to the interrupt controller and masked input events would be lost on
  real HW as well.
 
 This wouldn't work out nicely as well. We rather need a combined model:
 
 Devices need to maintain the PBA actively, i.e. set  clear them
 themselves and do not rely on the core here (with the core being either
 QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
 checks the PBA if it is about to deliver some message and refrains from
 doing so if the bit became 0 in the meantime (specifically during the
 masked period).

 For QEMU device models, that means no additional IOCTLs,
 just memory sharing of the PBA which is required anyway.

Sorry, I don't understand the above two paragraphs. Maybe I am
confused by terminology here. We really only need to check PBA when it's
read.  Whether the message is delivered only depends on the mask bit.


 
 But that means QEMU-external device models need to gain at least basic
 MSI-X knowledge. And if they gain this awareness, they could also use it
 to send full-blown messages directly (e.g. device-id/vector tuples)
 instead of encoding them into finite GSI numbers. But that's an add-on
 topic.
 
 Moreover, we still need a corresponding side channel for line-base
 interrupts.
 
 Jan

Agree on all points with the above.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 16:40, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
 On 2011-10-24 15:11, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the 
 whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.

 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.


 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

 I'm actually working on a qemu patch to get pba emulation working 
 correctly.
 I think it's doable with existing irqfd.

 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.


 and specifically vfio.

 Interesting. How would you clear the pseudo interrupt level?

 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

 This wouldn't work out nicely as well. We rather need a combined model:

 Devices need to maintain the PBA actively, i.e. set  clear them
 themselves and do not rely on the core here (with the core being either
 QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
 checks the PBA if it is about to deliver some message and refrains from
 doing so if the bit became 0 in the meantime (specifically during the
 masked period).

 For QEMU device models, that means no additional IOCTLs,
 just memory sharing of the PBA which is required anyway.
 
 Sorry, I don't understand the above two paragraphs. Maybe I am
 confused by terminology here. We really only need to check PBA when it's
 read.  Whether the message is delivered only depends on the mask bit.

This is what I have in mind:
 - devices set PBA bit if MSI message cannot be sent due to mask (*)
 - core checksclears PBA bit on unmask, injects message if bit was set
 - devices clear PBA bit if message reason is resolved before unmask (*)

The marked (*) lines differ from the current user space model where only
the core does PBA manipulation (including clearance via a special
function). Basically, the PBA becomes a communication channel also
between device and MSI core. And this model also works if core and
device run in different processes provided they set up the PBA as shared
memory.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 05:00:27PM +0200, Jan Kiszka wrote:
 On 2011-10-24 16:40, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
  On 2011-10-24 15:11, Jan Kiszka wrote:
  On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the 
  whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With 
  the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to 
  disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
 
 
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
 
  How should QEMU know if the reason for pending has been cleared at
  device level if the device is outside the scope of QEMU? This model only
  works for PV devices when you agree that spurious IRQs are OK.
 
 
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
 
  I'm actually working on a qemu patch to get pba emulation working 
  correctly.
  I think it's doable with existing irqfd.
 
  irqfd has no notion of level. You can only communicate a rising edge and
  then need a side channel for the state of the edge reason.
 
 
  and specifically vfio.
 
  Interesting. How would you clear the pseudo interrupt level?
 
  Ideally: not at all (for MSI). If we manage the mask at device level, we
  only need to send the message if there is actually something to deliver
  to the interrupt controller and masked input events would be lost on
  real HW as well.
 
  This wouldn't work out nicely as well. We rather need a combined model:
 
  Devices need to maintain the PBA actively, i.e. set  clear them
  themselves and do not rely on the core here (with the core being either
  QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
  checks the PBA if it is about to deliver some message and refrains from
  doing so if the bit became 0 in the meantime (specifically during the
  masked period).
 
  For QEMU device models, that means no additional IOCTLs,
  just memory sharing of the PBA which is required anyway.
  
  Sorry, I don't understand the above two paragraphs. Maybe I am
  confused by terminology here. We really only need to check PBA when it's
  read.  Whether the message is delivered only depends on the mask bit.
 
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

OK, but practically, when exactly does the device clear PBA?

 The marked (*) lines differ from the current user space model where only
 the core does PBA manipulation (including clearance via a special
 function). Basically, the PBA becomes a communication channel also
 between device and MSI core. And this model also works if core and
 device run in different processes provided they set up the PBA as shared
 memory.
 
 Jan
 


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)
 
 OK, but practically, when exactly does the device clear PBA?

Consider a network adapter that signals messages in a RX ring: If the
corresponding vector is masked while the guest empties the ring, I
strongly assume that the device is supposed to take back the pending bit
in that case so that there is no interrupt inject on a later vector
unmask operation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was set
   - devices clear PBA bit if message reason is resolved before unmask (*)
  
  OK, but practically, when exactly does the device clear PBA?
 
 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.
 
 Jan

Do you mean virtio here? Do you expect this optimization to give
a significant performance gain?

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
   This is what I have in mind:
- devices set PBA bit if MSI message cannot be sent due to mask (*)
- core checksclears PBA bit on unmask, injects message if bit was set
- devices clear PBA bit if message reason is resolved before unmask (*)
   
   OK, but practically, when exactly does the device clear PBA?
  
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
  
  Jan
 
 Do you mean virtio here? Do you expect this optimization to give
 a significant performance gain?

It would also be challenging to implement this in
a race free manner. Clearing on interrupt status read
seems straight-forward.

  -- 
  Siemens AG, Corporate Technology, CT T DE IT 1
  Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Jan Kiszka
Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated on the fly by user space,
IRQ routes are a limited resource that user space as to manage
carefully.

By providing a direct injection with, we can both avoid using up limited
resources and simplify the necessary steps for user land. The API
already provides a channel (flags) to revoke an injected but not yet
delivered message which will become important for in-kernel MSI-X vector
masking support.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Documentation/virtual/kvm/api.txt |   23 +++
 include/linux/kvm.h   |   15 +++
 virt/kvm/kvm_main.c   |   18 ++
 3 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7945b0b..f4c3de3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1383,6 +1383,29 @@ The following flags are defined:
 If datamatch flag is set, the event will be signaled only if the written value
 to the registered address is equal to datamatch in struct kvm_ioeventfd.
 
+4.59 KVM_SET_MSI
+
+Capability: KVM_CAP_SET_MSI
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_msi (in)
+Returns: 0 on success, -1 on error
+
+Directly inject a MSI message. Only valid with in-kernel irqchip that handles
+MSI messages.
+
+struct kvm_msi {
+   __u32 address_lo;
+   __u32 address_hi;
+   __u32 data;
+   __u32 flags;
+   __u8  pad[16];
+};
+
+The following flags are defined:
+
+#define KVM_MSI_FLAG_RAISE (1  0)
+
 4.62 KVM_CREATE_SPAPR_TCE
 
 Capability: KVM_CAP_SPAPR_TCE
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6884054..83875ed 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -557,6 +557,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_HIOR 67
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#ifdef __KVM_HAVE_MSI
+#define KVM_CAP_SET_MSI 72
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -636,6 +639,16 @@ struct kvm_clock_data {
__u32 pad[9];
 };
 
+#define KVM_MSI_FLAG_RAISE (1  0)
+
+struct kvm_msi {
+   __u32 address_lo;
+   __u32 address_hi;
+   __u32 data;
+   __u32 flags;
+   __u8  pad[16];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -696,6 +709,8 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_TSC_CONTROL */
 #define KVM_SET_TSC_KHZ   _IO(KVMIO,  0xa2)
 #define KVM_GET_TSC_KHZ   _IO(KVMIO,  0xa3)
+/* Available with KVM_CAP_SET_MSI */
+#define KVM_SET_MSI   _IOW(KVMIO,  0xa4, struct kvm_msi)
 
 /*
  * ioctls for vcpu fds
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..0e3a947 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2058,6 +2058,24 @@ static long kvm_vm_ioctl(struct file *filp,
mutex_unlock(kvm-lock);
break;
 #endif
+#ifdef __KVM_HAVE_MSI
+   case KVM_SET_MSI: {
+   struct kvm_kernel_irq_routing_entry route;
+   struct kvm_msi msi;
+
+   r = -EFAULT;
+   if (copy_from_user(msi, argp, sizeof msi))
+   goto out;
+   route.msi.address_lo = msi.address_lo;
+   route.msi.address_hi = msi.address_hi;
+   route.msi.data = msi.data;
+   r = 0;
+   if (msi.flags  KVM_MSI_FLAG_RAISE)
+   r =  kvm_set_msi(route, kvm,
+KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+   break;
+   }
+#endif
default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
if (r == -ENOTTY)
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Sasha Levin
On Fri, 2011-10-21 at 11:19 +0200, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.
 
 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  Documentation/virtual/kvm/api.txt |   23 +++
  include/linux/kvm.h   |   15 +++
  virt/kvm/kvm_main.c   |   18 ++
  3 files changed, 56 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7945b0b..f4c3de3 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -1383,6 +1383,29 @@ The following flags are defined:
  If datamatch flag is set, the event will be signaled only if the written 
 value
  to the registered address is equal to datamatch in struct kvm_ioeventfd.
  
 +4.59 KVM_SET_MSI
 +
 +Capability: KVM_CAP_SET_MSI
 +Architectures: x86 ia64
 +Type: vm ioctl
 +Parameters: struct kvm_msi (in)
 +Returns: 0 on success, -1 on error
 +
 +Directly inject a MSI message. Only valid with in-kernel irqchip that handles
 +MSI messages.
 +
 +struct kvm_msi {
 + __u32 address_lo;
 + __u32 address_hi;
 + __u32 data;
 + __u32 flags;
 + __u8  pad[16];
 +};
 +
 +The following flags are defined:
 +
 +#define KVM_MSI_FLAG_RAISE (1  0)
 +
  4.62 KVM_CREATE_SPAPR_TCE
  
  Capability: KVM_CAP_SPAPR_TCE
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 6884054..83875ed 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -557,6 +557,9 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_PPC_HIOR 67
  #define KVM_CAP_PPC_PAPR 68
  #define KVM_CAP_S390_GMAP 71
 +#ifdef __KVM_HAVE_MSI
 +#define KVM_CAP_SET_MSI 72
 +#endif
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -636,6 +639,16 @@ struct kvm_clock_data {
   __u32 pad[9];
  };
  
 +#define KVM_MSI_FLAG_RAISE (1  0)
 +
 +struct kvm_msi {
 + __u32 address_lo;
 + __u32 address_hi;
 + __u32 data;
 + __u32 flags;
 + __u8  pad[16];
 +};
 +

How about defining it as:

struct kvm_msi {
struct msi_msg msi;
__u32 flags;
__u8 pad[16];
};

It would allow keeping everything in a msi_msg all the way from
userspace up to kvm_set_msi()

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Michael S. Tsirkin
On Fri, Oct 21, 2011 at 11:19:19AM +0200, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.
 
 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

I would love to see how you envision extending this to add the masking
support at least at the API level, not necessarily the supporting code.

It would seem hard to use flags field for that since MSIX mask is per
device per vector, not per message.
Which gets us back to resource per vector which userspace has to manage
...

interrupt remapping is also per device, so it isn't any easier
with this API.

 ---
  Documentation/virtual/kvm/api.txt |   23 +++
  include/linux/kvm.h   |   15 +++
  virt/kvm/kvm_main.c   |   18 ++
  3 files changed, 56 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7945b0b..f4c3de3 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -1383,6 +1383,29 @@ The following flags are defined:
  If datamatch flag is set, the event will be signaled only if the written 
 value
  to the registered address is equal to datamatch in struct kvm_ioeventfd.
  
 +4.59 KVM_SET_MSI
 +
 +Capability: KVM_CAP_SET_MSI
 +Architectures: x86 ia64
 +Type: vm ioctl
 +Parameters: struct kvm_msi (in)
 +Returns: 0 on success, -1 on error
 +
 +Directly inject a MSI message. Only valid with in-kernel irqchip that handles
 +MSI messages.
 +
 +struct kvm_msi {
 + __u32 address_lo;
 + __u32 address_hi;
 + __u32 data;
 + __u32 flags;
 + __u8  pad[16];
 +};
 +
 +The following flags are defined:
 +
 +#define KVM_MSI_FLAG_RAISE (1  0)
 +
  4.62 KVM_CREATE_SPAPR_TCE
  
  Capability: KVM_CAP_SPAPR_TCE
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 6884054..83875ed 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -557,6 +557,9 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_PPC_HIOR 67
  #define KVM_CAP_PPC_PAPR 68
  #define KVM_CAP_S390_GMAP 71
 +#ifdef __KVM_HAVE_MSI
 +#define KVM_CAP_SET_MSI 72
 +#endif
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -636,6 +639,16 @@ struct kvm_clock_data {
   __u32 pad[9];
  };
  
 +#define KVM_MSI_FLAG_RAISE (1  0)
 +
 +struct kvm_msi {
 + __u32 address_lo;
 + __u32 address_hi;
 + __u32 data;
 + __u32 flags;
 + __u8  pad[16];
 +};
 +
  /*
   * ioctls for VM fds
   */
 @@ -696,6 +709,8 @@ struct kvm_clock_data {
  /* Available with KVM_CAP_TSC_CONTROL */
  #define KVM_SET_TSC_KHZ   _IO(KVMIO,  0xa2)
  #define KVM_GET_TSC_KHZ   _IO(KVMIO,  0xa3)
 +/* Available with KVM_CAP_SET_MSI */
 +#define KVM_SET_MSI   _IOW(KVMIO,  0xa4, struct kvm_msi)
  
  /*
   * ioctls for vcpu fds
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index d9cfb78..0e3a947 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -2058,6 +2058,24 @@ static long kvm_vm_ioctl(struct file *filp,
   mutex_unlock(kvm-lock);
   break;
  #endif
 +#ifdef __KVM_HAVE_MSI
 + case KVM_SET_MSI: {
 + struct kvm_kernel_irq_routing_entry route;
 + struct kvm_msi msi;
 +
 + r = -EFAULT;
 + if (copy_from_user(msi, argp, sizeof msi))
 + goto out;
 + route.msi.address_lo = msi.address_lo;
 + route.msi.address_hi = msi.address_hi;
 + route.msi.data = msi.data;
 + r = 0;
 + if (msi.flags  KVM_MSI_FLAG_RAISE)
 + r =  kvm_set_msi(route, kvm,
 +  KVM_USERSPACE_IRQ_SOURCE_ID, 1);
 + break;
 + }
 +#endif
   default:
   r = kvm_arch_vm_ioctl(filp, ioctl, arg);
   if (r == -ENOTTY)
 -- 
 1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Jan Kiszka
On 2011-10-21 13:06, Michael S. Tsirkin wrote:
 On Fri, Oct 21, 2011 at 11:19:19AM +0200, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.

 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 I would love to see how you envision extending this to add the masking
 support at least at the API level, not necessarily the supporting code.
 
 It would seem hard to use flags field for that since MSIX mask is per
 device per vector, not per message.
 Which gets us back to resource per vector which userspace has to manage
 ...
 
 interrupt remapping is also per device, so it isn't any easier
 with this API.

Yes, we will need an additional field to associate the message with its
source device. Could be a PCI address or a handle (like the one assigned
devices get) returned on MSI-X kernel region setup. We will need a flag
to declare that address/handle valid, also to tell apart platform MSI
messages (e.g. coming from HPET on x86). I see no obstacles ATM that
prevent doing that on top of this API, do you?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Michael S. Tsirkin
On Fri, Oct 21, 2011 at 01:51:15PM +0200, Jan Kiszka wrote:
 On 2011-10-21 13:06, Michael S. Tsirkin wrote:
  On Fri, Oct 21, 2011 at 11:19:19AM +0200, Jan Kiszka wrote:
  Currently, MSI messages can only be injected to in-kernel irqchips by
  defining a corresponding IRQ route for each message. This is not only
  unhandy if the MSI messages are generated on the fly by user space,
  IRQ routes are a limited resource that user space as to manage
  carefully.
 
  By providing a direct injection with, we can both avoid using up limited
  resources and simplify the necessary steps for user land. The API
  already provides a channel (flags) to revoke an injected but not yet
  delivered message which will become important for in-kernel MSI-X vector
  masking support.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  
  I would love to see how you envision extending this to add the masking
  support at least at the API level, not necessarily the supporting code.
  
  It would seem hard to use flags field for that since MSIX mask is per
  device per vector, not per message.
  Which gets us back to resource per vector which userspace has to manage
  ...
  
  interrupt remapping is also per device, so it isn't any easier
  with this API.
 
 Yes, we will need an additional field to associate the message with its
 source device. Could be a PCI address or a handle (like the one assigned
 devices get) returned on MSI-X kernel region setup. We will need a flag
 to declare that address/handle valid, also to tell apart platform MSI
 messages (e.g. coming from HPET on x86).

I have not thought about remapping a lot yet:
HPET interrupts are not subject to remapping?

 I see no obstacles ATM that
 prevent doing that on top of this API, do you?
 
 Jan

For masking, I think I do. We need to maintain the pending bit
and the io notifiers in kernel, per vector.
An MSI injected with just an address/data pair, without
vector/device info, can't be masked properly.

We get back to maintaining some handle per vector, right?

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-21 Thread Jan Kiszka
On 2011-10-21 14:04, Michael S. Tsirkin wrote:
 On Fri, Oct 21, 2011 at 01:51:15PM +0200, Jan Kiszka wrote:
 On 2011-10-21 13:06, Michael S. Tsirkin wrote:
 On Fri, Oct 21, 2011 at 11:19:19AM +0200, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.

 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 I would love to see how you envision extending this to add the masking
 support at least at the API level, not necessarily the supporting code.

 It would seem hard to use flags field for that since MSIX mask is per
 device per vector, not per message.
 Which gets us back to resource per vector which userspace has to manage
 ...

 interrupt remapping is also per device, so it isn't any easier
 with this API.

 Yes, we will need an additional field to associate the message with its
 source device. Could be a PCI address or a handle (like the one assigned
 devices get) returned on MSI-X kernel region setup. We will need a flag
 to declare that address/handle valid, also to tell apart platform MSI
 messages (e.g. coming from HPET on x86).
 
 I have not thought about remapping a lot yet:
 HPET interrupts are not subject to remapping?

Looks it is, at least on VT-d: The related VT-d document knows two
non-PCI source IDs, namely legacy pin interrupts and other MSIs. So we
may want a more generic source ID that, for MSI-X in-kernel masking, can
then be associated with a device vector for which we accelerate mask
management.

 
 I see no obstacles ATM that
 prevent doing that on top of this API, do you?

 Jan
 
 For masking, I think I do. We need to maintain the pending bit
 and the io notifiers in kernel, per vector.
 An MSI injected with just an address/data pair, without
 vector/device info, can't be masked properly.
 
 We get back to maintaining some handle per vector, right?

First of all, the common case for in-kernel MSI-X mask management will
be MSI sources that are _not_ injected as address-data pair from user
space but come from in-kernel sources (irqfd or host IRQs, ie. assigned
devices). In contrast, this API here is targeting MSI messages generated
in the hypervisor process (ie. current QEMU device emulation).

Still, the new interface should allow for injecting the other vectors as
well without requiring additional coordination of an in-kernel MSI-X
page vs. user space's view on it. For that reason we need a per vector
handle for that special case. But that will naturally derive from
defining a generic MSI-X in-kernel mask management API. You will have to
specify which device shall be accelerated and how many vectors it has
(at maximum). So a directly injected MSI message for those devices will
have to specify that source tuple (device, vector), but only in that
special case.

Maybe I will sit down now and create a draft for a MSI-X mask
acceleration API. That may help feeling better about this proposal. :)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html