Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-17 Thread Alex Williamson
On Wed, 2012-08-22 at 11:25 +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
  On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
   On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
Here's the much anticipated re-write of support for level irqfds.  As
Michael suggested, I've rolled the eoi/ack notification fd into
KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
be objections to associating this specifically with an EOI or an ACK,
I've name this OADN or On Ack, De-assert  Notify.

Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
the reason is it's racy.  Objects to track OADNs are made dynamically,
we look through existing ones for a match under spinlock and setup a
new one if there's no match.  On teardown, we can remove the OADN from
the list under lock, but that same lock prevents us from de-assigning
the IRQ ACK notifier or waiting for an RCU grace period.  We must make
sure that any unused GSI is de-asserted, but the above means it's
possible that another OADN has been created for this source ID/GSI
and de-asserting the GSI could lead to breakage.
   
   I do not see it. What breakage? Could you give an example please?
   
   
   I think what you are saying is last deassign must clear
   since otherwise we never will clear.
   I agree it is either that or delay deassign until ack.
   
   Can it be as simple as this (after all rcu etc dances)?
 lock irqfds
 if no oadns
 set level to 0
 unlock irqfds
   ?
  
  lock irqfds
  remove irqfd from oadn list
  if no oadns
  remove oadn
  set gsi 0
  unlock
  lock irqfds
  new oadn
  unlock irqfds
  
   EOI 
ack notify new 
  oadn
de-assert gsi
notify new oadn
   re-assert irqfd
ack notify old 
  oadn
de-assert gsi
notify old oadn
  
  synchronize_rcu
  
  kvm_unregister_irq_ack_notifier
 
  So, because the unregister is removed from the final clear and because
  we share an IRQ source ID there's a window where we can have two oadns
  registered for the same GSI.  The new one will de-assert and notify
  while the old one has an empty list to notify, but still de-asserts.  We
  can therefore de-assert w/o notify.
  
  By using a new source ID, we separate the two so users of the new oadn
  can't race the old and we can cleanly free the old source ID,
  de-asserting it.
 
 Need to think about it some more but is the problem two
 ack notifiers for the same gsi?

yes

 In that case, how about we add __kvm_unregister_irq_ack_notifier
 with no locking, and do most of the above under
 kvm-irq_lock?

Converting locks makes me nervous, but I'll give it a shot.  I don't
know how easy/possible it is though.  I know in previous iterations I
tried to make something similar to irqfd use a mutex and couldn't, but I
don't remember the details.

 With one change: it is better not to call synchronize_rcu
 under irq lock, I think we can safely move it to after
 __kvm_unregister_irq_ack_notifier.

Yep, that makes the interface pretty ugly though as we then have two
separate, but dependent steps.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-05 Thread Avi Kivity
On 08/21/2012 10:28 PM, Alex Williamson wrote:
 Here's the much anticipated re-write of support for level irqfds.  As
 Michael suggested, I've rolled the eoi/ack notification fd into
 KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
 be objections to associating this specifically with an EOI or an ACK,
 I've name this OADN or On Ack, De-assert  Notify.

Perhaps we can call it resample, since this is what it is doing.
irqfd tells kvm that the line as been asserted, resample tells the
caller to check again.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-05 Thread Michael S. Tsirkin
On Wed, Sep 05, 2012 at 05:42:38PM +0300, Avi Kivity wrote:
 On 08/21/2012 10:28 PM, Alex Williamson wrote:
  Here's the much anticipated re-write of support for level irqfds.  As
  Michael suggested, I've rolled the eoi/ack notification fd into
  KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
  be objections to associating this specifically with an EOI or an ACK,
  I've name this OADN or On Ack, De-assert  Notify.
 
 Perhaps we can call it resample, since this is what it is doing.
 irqfd tells kvm that the line as been asserted, resample tells the
 caller to check again.

Sounds good.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-22 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
 On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
  On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
   Here's the much anticipated re-write of support for level irqfds.  As
   Michael suggested, I've rolled the eoi/ack notification fd into
   KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
   be objections to associating this specifically with an EOI or an ACK,
   I've name this OADN or On Ack, De-assert  Notify.
   
   Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
   since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
   Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
   the reason is it's racy.  Objects to track OADNs are made dynamically,
   we look through existing ones for a match under spinlock and setup a
   new one if there's no match.  On teardown, we can remove the OADN from
   the list under lock, but that same lock prevents us from de-assigning
   the IRQ ACK notifier or waiting for an RCU grace period.  We must make
   sure that any unused GSI is de-asserted, but the above means it's
   possible that another OADN has been created for this source ID/GSI
   and de-asserting the GSI could lead to breakage.
  
  I do not see it. What breakage? Could you give an example please?
  
  
  I think what you are saying is last deassign must clear
  since otherwise we never will clear.
  I agree it is either that or delay deassign until ack.
  
  Can it be as simple as this (after all rcu etc dances)?
  lock irqfds
  if no oadns
  set level to 0
  unlock irqfds
  ?
 
 lock irqfds
 remove irqfd from oadn list
 if no oadns
 remove oadn
 set gsi 0
 unlock
 lock irqfds
 new oadn
 unlock irqfds
 
  EOI 
   ack notify new oadn
   de-assert gsi
   notify new oadn
  re-assert irqfd
   ack notify old oadn
   de-assert gsi
   notify old oadn
 
 synchronize_rcu
 
 kvm_unregister_irq_ack_notifier

 So, because the unregister is removed from the final clear and because
 we share an IRQ source ID there's a window where we can have two oadns
 registered for the same GSI.  The new one will de-assert and notify
 while the old one has an empty list to notify, but still de-asserts.  We
 can therefore de-assert w/o notify.
 
 By using a new source ID, we separate the two so users of the new oadn
 can't race the old and we can cleanly free the old source ID,
 de-asserting it.

Need to think about it some more but is the problem two
ack notifiers for the same gsi?

In that case, how about we add __kvm_unregister_irq_ack_notifier
with no locking, and do most of the above under
kvm-irq_lock?

With one change: it is better not to call synchronize_rcu
under irq lock, I think we can safely move it to after
__kvm_unregister_irq_ack_notifier.


Instead each OADN
   object gets it's own source ID, but these are all shared by users
   of the same GSI.  So for PCI devices, we might have up to 4 IRQ
   source IDs allocated.
   
   Michael had also suggested avoiding reference counting and using
   list_empty for this OADN object.  Unfortunately, that doesn't work
   for similar reasons.  We want to release the OADN object underlock,
   preventing others from re-using it on the free path, but in order
   to have lock-less de-assert  notify we use RCU, meaning we can't
   trust list_empty until after an RCU grace period, which must be
   done outside of spinlocks.
  
  confused. list empty on assign/deassing would be under lock
  so no need for grace periods to trust it.
  what am I missing?
  
  But if you like kref more that is OK too.
 
 Maybe I'm misinterpreting this:
 
 include/linux/rculist.h:
 /**
  * list_del_rcu - deletes entry from list without re-initialization
  * @entry: the element to delete from the list.
  *
  * Note: list_empty() on entry does not return true after this,
  * the entry is in an undefined state. It is useful for RCU based
  * lockfree traversal.
 
 If I can trust list_empty on oadn-irqfds, which maybe I can re-reading
 it again, then we can drop the kref.  Thanks,
 
 Alex

I think you are - *the entry you deleted* is not empty.
The list itself naturally is, or so it seems from code.

static inline void list_del_rcu(struct list_head *entry)
{
__list_del_entry(entry);
entry-prev = LIST_POISON2;
}

No?

 
   If there are suggestions how we can handle these better, please
   make them, but I think this compromise is race-free and still
   manages to 

Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-22 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
 Here's the much anticipated re-write of support for level irqfds.  As
 Michael suggested, I've rolled the eoi/ack notification fd into
 KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
 be objections to associating this specifically with an EOI or an ACK,
 I've name this OADN or On Ack, De-assert  Notify.

Kid wouldn't let me sleep in the night so I've been thinking of better
names :).

I think a non abbreviated deassert_on_ack is a better name.
Yes e.g. irqfd is an abbreviation too, but it is
actually just a composition of irq and fd which are
both standard, familiar abbreviations.

If you have really set your heart on using an abbreviation,
it is better to not put two vowels together.
ADN (ack deasserts and notifies or auto deassert and notify)
is a bit better in that respect, though it does look a bit like
a miss-spelled AND which is unfortunate.

But obviously, it's all not terribly important.

 Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
 since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
 Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
 the reason is it's racy.  Objects to track OADNs are made dynamically,
 we look through existing ones for a match under spinlock and setup a
 new one if there's no match.  On teardown, we can remove the OADN from
 the list under lock, but that same lock prevents us from de-assigning
 the IRQ ACK notifier or waiting for an RCU grace period.  We must make
 sure that any unused GSI is de-asserted, but the above means it's
 possible that another OADN has been created for this source ID/GSI
 and de-asserting the GSI could lead to breakage.  Instead each OADN
 object gets it's own source ID, but these are all shared by users
 of the same GSI.  So for PCI devices, we might have up to 4 IRQ
 source IDs allocated.
 
 Michael had also suggested avoiding reference counting and using
 list_empty for this OADN object.  Unfortunately, that doesn't work
 for similar reasons.  We want to release the OADN object underlock,
 preventing others from re-using it on the free path, but in order
 to have lock-less de-assert  notify we use RCU, meaning we can't
 trust list_empty until after an RCU grace period, which must be
 done outside of spinlocks.
 
 If there are suggestions how we can handle these better, please
 make them, but I think this compromise is race-free and still
 manages to make allocation of IRQ source IDs mostly a non-issue
 for device assignment limits.  Thanks,
 
 Alex
 
 ---
 
 Alex Williamson (2):
   kvm: On Ack, De-assert  Notify KVM_IRQFD extension
   kvm: Use a reserved IRQ source ID for irqfd
 
 
  Documentation/virtual/kvm/api.txt |   13 ++
  arch/x86/kvm/x86.c|4 +
  include/linux/kvm.h   |7 +
  include/linux/kvm_host.h  |2 
  virt/kvm/eventfd.c|  199 
 -
  5 files changed, 218 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 0/2] kvm: level irqfd support

2012-08-21 Thread Alex Williamson
Here's the much anticipated re-write of support for level irqfds.  As
Michael suggested, I've rolled the eoi/ack notification fd into
KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
be objections to associating this specifically with an EOI or an ACK,
I've name this OADN or On Ack, De-assert  Notify.

Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
the reason is it's racy.  Objects to track OADNs are made dynamically,
we look through existing ones for a match under spinlock and setup a
new one if there's no match.  On teardown, we can remove the OADN from
the list under lock, but that same lock prevents us from de-assigning
the IRQ ACK notifier or waiting for an RCU grace period.  We must make
sure that any unused GSI is de-asserted, but the above means it's
possible that another OADN has been created for this source ID/GSI
and de-asserting the GSI could lead to breakage.  Instead each OADN
object gets it's own source ID, but these are all shared by users
of the same GSI.  So for PCI devices, we might have up to 4 IRQ
source IDs allocated.

Michael had also suggested avoiding reference counting and using
list_empty for this OADN object.  Unfortunately, that doesn't work
for similar reasons.  We want to release the OADN object underlock,
preventing others from re-using it on the free path, but in order
to have lock-less de-assert  notify we use RCU, meaning we can't
trust list_empty until after an RCU grace period, which must be
done outside of spinlocks.

If there are suggestions how we can handle these better, please
make them, but I think this compromise is race-free and still
manages to make allocation of IRQ source IDs mostly a non-issue
for device assignment limits.  Thanks,

Alex

---

Alex Williamson (2):
  kvm: On Ack, De-assert  Notify KVM_IRQFD extension
  kvm: Use a reserved IRQ source ID for irqfd


 Documentation/virtual/kvm/api.txt |   13 ++
 arch/x86/kvm/x86.c|4 +
 include/linux/kvm.h   |7 +
 include/linux/kvm_host.h  |2 
 virt/kvm/eventfd.c|  199 -
 5 files changed, 218 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
 Here's the much anticipated re-write of support for level irqfds.  As
 Michael suggested, I've rolled the eoi/ack notification fd into
 KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
 be objections to associating this specifically with an EOI or an ACK,
 I've name this OADN or On Ack, De-assert  Notify.
 
 Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
 since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
 Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
 the reason is it's racy.  Objects to track OADNs are made dynamically,
 we look through existing ones for a match under spinlock and setup a
 new one if there's no match.  On teardown, we can remove the OADN from
 the list under lock, but that same lock prevents us from de-assigning
 the IRQ ACK notifier or waiting for an RCU grace period.  We must make
 sure that any unused GSI is de-asserted, but the above means it's
 possible that another OADN has been created for this source ID/GSI
 and de-asserting the GSI could lead to breakage.

I do not see it. What breakage? Could you give an example please?


I think what you are saying is last deassign must clear
since otherwise we never will clear.
I agree it is either that or delay deassign until ack.

Can it be as simple as this (after all rcu etc dances)?
lock irqfds
if no oadns
set level to 0
unlock irqfds
?



  Instead each OADN
 object gets it's own source ID, but these are all shared by users
 of the same GSI.  So for PCI devices, we might have up to 4 IRQ
 source IDs allocated.
 
 Michael had also suggested avoiding reference counting and using
 list_empty for this OADN object.  Unfortunately, that doesn't work
 for similar reasons.  We want to release the OADN object underlock,
 preventing others from re-using it on the free path, but in order
 to have lock-less de-assert  notify we use RCU, meaning we can't
 trust list_empty until after an RCU grace period, which must be
 done outside of spinlocks.

confused. list empty on assign/deassing would be under lock
so no need for grace periods to trust it.
what am I missing?

But if you like kref more that is OK too.

 If there are suggestions how we can handle these better, please
 make them, but I think this compromise is race-free and still
 manages to make allocation of IRQ source IDs mostly a non-issue
 for device assignment limits.  Thanks,
 
 Alex
 
 ---
 
 Alex Williamson (2):
   kvm: On Ack, De-assert  Notify KVM_IRQFD extension
   kvm: Use a reserved IRQ source ID for irqfd
 
 
  Documentation/virtual/kvm/api.txt |   13 ++
  arch/x86/kvm/x86.c|4 +
  include/linux/kvm.h   |7 +
  include/linux/kvm_host.h  |2 
  virt/kvm/eventfd.c|  199 
 -
  5 files changed, 218 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-21 Thread Alex Williamson
On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
  Here's the much anticipated re-write of support for level irqfds.  As
  Michael suggested, I've rolled the eoi/ack notification fd into
  KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
  be objections to associating this specifically with an EOI or an ACK,
  I've name this OADN or On Ack, De-assert  Notify.
  
  Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
  since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
  Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
  the reason is it's racy.  Objects to track OADNs are made dynamically,
  we look through existing ones for a match under spinlock and setup a
  new one if there's no match.  On teardown, we can remove the OADN from
  the list under lock, but that same lock prevents us from de-assigning
  the IRQ ACK notifier or waiting for an RCU grace period.  We must make
  sure that any unused GSI is de-asserted, but the above means it's
  possible that another OADN has been created for this source ID/GSI
  and de-asserting the GSI could lead to breakage.
 
 I do not see it. What breakage? Could you give an example please?
 
 
 I think what you are saying is last deassign must clear
 since otherwise we never will clear.
 I agree it is either that or delay deassign until ack.
 
 Can it be as simple as this (after all rcu etc dances)?
   lock irqfds
   if no oadns
   set level to 0
   unlock irqfds
 ?

lock irqfds
remove irqfd from oadn list
if no oadns
remove oadn
set gsi 0
unlock
lock irqfds
new oadn
unlock irqfds

 EOI 
  ack notify new oadn
  de-assert gsi
  notify new oadn
 re-assert irqfd
  ack notify old oadn
  de-assert gsi
  notify old oadn

synchronize_rcu

kvm_unregister_irq_ack_notifier

So, because the unregister is removed from the final clear and because
we share an IRQ source ID there's a window where we can have two oadns
registered for the same GSI.  The new one will de-assert and notify
while the old one has an empty list to notify, but still de-asserts.  We
can therefore de-assert w/o notify.

By using a new source ID, we separate the two so users of the new oadn
can't race the old and we can cleanly free the old source ID,
de-asserting it.

   Instead each OADN
  object gets it's own source ID, but these are all shared by users
  of the same GSI.  So for PCI devices, we might have up to 4 IRQ
  source IDs allocated.
  
  Michael had also suggested avoiding reference counting and using
  list_empty for this OADN object.  Unfortunately, that doesn't work
  for similar reasons.  We want to release the OADN object underlock,
  preventing others from re-using it on the free path, but in order
  to have lock-less de-assert  notify we use RCU, meaning we can't
  trust list_empty until after an RCU grace period, which must be
  done outside of spinlocks.
 
 confused. list empty on assign/deassing would be under lock
 so no need for grace periods to trust it.
 what am I missing?
 
 But if you like kref more that is OK too.

Maybe I'm misinterpreting this:

include/linux/rculist.h:
/**
 * list_del_rcu - deletes entry from list without re-initialization
 * @entry: the element to delete from the list.
 *
 * Note: list_empty() on entry does not return true after this,
 * the entry is in an undefined state. It is useful for RCU based
 * lockfree traversal.

If I can trust list_empty on oadn-irqfds, which maybe I can re-reading
it again, then we can drop the kref.  Thanks,

Alex


  If there are suggestions how we can handle these better, please
  make them, but I think this compromise is race-free and still
  manages to make allocation of IRQ source IDs mostly a non-issue
  for device assignment limits.  Thanks,
  
  Alex
  
  ---
  
  Alex Williamson (2):
kvm: On Ack, De-assert  Notify KVM_IRQFD extension
kvm: Use a reserved IRQ source ID for irqfd
  
  
   Documentation/virtual/kvm/api.txt |   13 ++
   arch/x86/kvm/x86.c|4 +
   include/linux/kvm.h   |7 +
   include/linux/kvm_host.h  |2 
   virt/kvm/eventfd.c|  199 
  -
   5 files changed, 218 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html