Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-17 Thread Alex Williamson
On Wed, 2012-08-22 at 11:25 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
> > On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > > > Here's the much anticipated re-write of support for level irqfds.  As
> > > > Michael suggested, I've rolled the eoi/ack notification fd into
> > > > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > > > be objections to associating this specifically with an EOI or an ACK,
> > > > I've name this OADN or "On Ack, De-assert & Notify".
> > > > 
> > > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > > > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > > > we look through existing ones for a match under spinlock and setup a
> > > > new one if there's no match.  On teardown, we can remove the OADN from
> > > > the list under lock, but that same lock prevents us from de-assigning
> > > > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > > > sure that any unused GSI is de-asserted, but the above means it's
> > > > possible that another OADN has been created for this source ID/GSI
> > > > and de-asserting the GSI could lead to breakage.
> > > 
> > > I do not see it. What breakage? Could you give an example please?
> > > 
> > > 
> > > I think what you are saying is last deassign must clear
> > > since otherwise we never will clear.
> > > I agree it is either that or delay deassign until ack.
> > > 
> > > Can it be as simple as this (after all rcu etc dances)?
> > >   lock irqfds
> > >   if no oadns
> > >   set level to 0
> > >   unlock irqfds
> > > ?
> > 
> > lock irqfds
> > remove irqfd from oadn list
> > if no oadns
> > remove oadn
> > set gsi 0
> > unlock
> > lock irqfds
> > new oadn
> > unlock irqfds
> > 
> > >> EOI 
> >   ack notify new 
> > oadn
> >   de-assert gsi
> >   notify new oadn
> > >> re-assert irqfd
> >   ack notify old 
> > oadn
> >   de-assert gsi
> >   notify old oadn
> > 
> > synchronize_rcu
> > 
> > kvm_unregister_irq_ack_notifier
> >
> > So, because the unregister is removed from the final clear and because
> > we share an IRQ source ID there's a window where we can have two oadns
> > registered for the same GSI.  The new one will de-assert and notify
> > while the old one has an empty list to notify, but still de-asserts.  We
> > can therefore de-assert w/o notify.
> > 
> > By using a new source ID, we separate the two so users of the new oadn
> > can't race the old and we can cleanly free the old source ID,
> > de-asserting it.
> 
> Need to think about it some more but is the problem two
> ack notifiers for the same gsi?

yes

> In that case, how about we add __kvm_unregister_irq_ack_notifier
> with no locking, and do most of the above under
> kvm->irq_lock?

Converting locks makes me nervous, but I'll give it a shot.  I don't
know how easy/possible it is though.  I know in previous iterations I
tried to make something similar to irqfd use a mutex and couldn't, but I
don't remember the details.

> With one change: it is better not to call synchronize_rcu
> under irq lock, I think we can safely move it to after
> __kvm_unregister_irq_ack_notifier.

Yep, that makes the interface pretty ugly though as we then have two
separate, but dependent steps.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-05 Thread Michael S. Tsirkin
On Wed, Sep 05, 2012 at 05:42:38PM +0300, Avi Kivity wrote:
> On 08/21/2012 10:28 PM, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds.  As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> 
> Perhaps we can call it "resample", since this is what it is doing.
> irqfd tells kvm that the line as been asserted, resample tells the
> caller to check again.

Sounds good.

> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-09-05 Thread Avi Kivity
On 08/21/2012 10:28 PM, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".

Perhaps we can call it "resample", since this is what it is doing.
irqfd tells kvm that the line as been asserted, resample tells the
caller to check again.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-22 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".

Kid wouldn't let me sleep in the night so I've been thinking of better
names :).

I think a non abbreviated deassert_on_ack is a better name.
Yes e.g. irqfd is an abbreviation too, but it is
actually just a composition of irq and fd which are
both standard, familiar abbreviations.

If you have really set your heart on using an abbreviation,
it is better to not put two vowels together.
ADN (ack deasserts and notifies or auto deassert and notify)
is a bit better in that respect, though it does look a bit like
a miss-spelled AND which is unfortunate.

But obviously, it's all not terribly important.

> Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> the reason is it's racy.  Objects to track OADNs are made dynamically,
> we look through existing ones for a match under spinlock and setup a
> new one if there's no match.  On teardown, we can remove the OADN from
> the list under lock, but that same lock prevents us from de-assigning
> the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> sure that any unused GSI is de-asserted, but the above means it's
> possible that another OADN has been created for this source ID/GSI
> and de-asserting the GSI could lead to breakage.  Instead each OADN
> object gets it's own source ID, but these are all shared by users
> of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> source IDs allocated.
> 
> Michael had also suggested avoiding reference counting and using
> list_empty for this OADN object.  Unfortunately, that doesn't work
> for similar reasons.  We want to release the OADN object underlock,
> preventing others from re-using it on the free path, but in order
> to have lock-less de-assert & notify we use RCU, meaning we can't
> trust list_empty until after an RCU grace period, which must be
> done outside of spinlocks.
> 
> If there are suggestions how we can handle these better, please
> make them, but I think this compromise is race-free and still
> manages to make allocation of IRQ source IDs mostly a non-issue
> for device assignment limits.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   kvm: On Ack, De-assert & Notify KVM_IRQFD extension
>   kvm: Use a reserved IRQ source ID for irqfd
> 
> 
>  Documentation/virtual/kvm/api.txt |   13 ++
>  arch/x86/kvm/x86.c|4 +
>  include/linux/kvm.h   |7 +
>  include/linux/kvm_host.h  |2 
>  virt/kvm/eventfd.c|  199 
> -
>  5 files changed, 218 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-22 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
> On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > > Here's the much anticipated re-write of support for level irqfds.  As
> > > Michael suggested, I've rolled the eoi/ack notification fd into
> > > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > > be objections to associating this specifically with an EOI or an ACK,
> > > I've name this OADN or "On Ack, De-assert & Notify".
> > > 
> > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > > we look through existing ones for a match under spinlock and setup a
> > > new one if there's no match.  On teardown, we can remove the OADN from
> > > the list under lock, but that same lock prevents us from de-assigning
> > > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > > sure that any unused GSI is de-asserted, but the above means it's
> > > possible that another OADN has been created for this source ID/GSI
> > > and de-asserting the GSI could lead to breakage.
> > 
> > I do not see it. What breakage? Could you give an example please?
> > 
> > 
> > I think what you are saying is last deassign must clear
> > since otherwise we never will clear.
> > I agree it is either that or delay deassign until ack.
> > 
> > Can it be as simple as this (after all rcu etc dances)?
> > lock irqfds
> > if no oadns
> > set level to 0
> > unlock irqfds
> > ?
> 
> lock irqfds
> remove irqfd from oadn list
> if no oadns
> remove oadn
> set gsi 0
> unlock
> lock irqfds
> new oadn
> unlock irqfds
> 
> >> EOI 
>   ack notify new oadn
>   de-assert gsi
>   notify new oadn
> >> re-assert irqfd
>   ack notify old oadn
>   de-assert gsi
>   notify old oadn
> 
> synchronize_rcu
> 
> kvm_unregister_irq_ack_notifier
>
> So, because the unregister is removed from the final clear and because
> we share an IRQ source ID there's a window where we can have two oadns
> registered for the same GSI.  The new one will de-assert and notify
> while the old one has an empty list to notify, but still de-asserts.  We
> can therefore de-assert w/o notify.
> 
> By using a new source ID, we separate the two so users of the new oadn
> can't race the old and we can cleanly free the old source ID,
> de-asserting it.

Need to think about it some more but is the problem two
ack notifiers for the same gsi?

In that case, how about we add __kvm_unregister_irq_ack_notifier
with no locking, and do most of the above under
kvm->irq_lock?

With one change: it is better not to call synchronize_rcu
under irq lock, I think we can safely move it to after
__kvm_unregister_irq_ack_notifier.


> > >  Instead each OADN
> > > object gets it's own source ID, but these are all shared by users
> > > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > > source IDs allocated.
> > > 
> > > Michael had also suggested avoiding reference counting and using
> > > list_empty for this OADN object.  Unfortunately, that doesn't work
> > > for similar reasons.  We want to release the OADN object underlock,
> > > preventing others from re-using it on the free path, but in order
> > > to have lock-less de-assert & notify we use RCU, meaning we can't
> > > trust list_empty until after an RCU grace period, which must be
> > > done outside of spinlocks.
> > 
> > confused. list empty on assign/deassing would be under lock
> > so no need for grace periods to trust it.
> > what am I missing?
> > 
> > But if you like kref more that is OK too.
> 
> Maybe I'm misinterpreting this:
> 
> include/linux/rculist.h:
> /**
>  * list_del_rcu - deletes entry from list without re-initialization
>  * @entry: the element to delete from the list.
>  *
>  * Note: list_empty() on entry does not return true after this,
>  * the entry is in an undefined state. It is useful for RCU based
>  * lockfree traversal.
> 
> If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
> it again, then we can drop the kref.  Thanks,
> 
> Alex

I think you are - *the entry you deleted* is not empty.
The list itself naturally is, or so it seems from code.

static inline void list_del_rcu(struct list_head *entry)
{
__list_del_entry(entry);
 

Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-21 Thread Alex Williamson
On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > Here's the much anticipated re-write of support for level irqfds.  As
> > Michael suggested, I've rolled the eoi/ack notification fd into
> > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > be objections to associating this specifically with an EOI or an ACK,
> > I've name this OADN or "On Ack, De-assert & Notify".
> > 
> > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > we look through existing ones for a match under spinlock and setup a
> > new one if there's no match.  On teardown, we can remove the OADN from
> > the list under lock, but that same lock prevents us from de-assigning
> > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > sure that any unused GSI is de-asserted, but the above means it's
> > possible that another OADN has been created for this source ID/GSI
> > and de-asserting the GSI could lead to breakage.
> 
> I do not see it. What breakage? Could you give an example please?
> 
> 
> I think what you are saying is last deassign must clear
> since otherwise we never will clear.
> I agree it is either that or delay deassign until ack.
> 
> Can it be as simple as this (after all rcu etc dances)?
>   lock irqfds
>   if no oadns
>   set level to 0
>   unlock irqfds
> ?

lock irqfds
remove irqfd from oadn list
if no oadns
remove oadn
set gsi 0
unlock
lock irqfds
new oadn
unlock irqfds

>> EOI 
  ack notify new oadn
  de-assert gsi
  notify new oadn
>> re-assert irqfd
  ack notify old oadn
  de-assert gsi
  notify old oadn

synchronize_rcu

kvm_unregister_irq_ack_notifier

So, because the unregister is removed from the final clear and because
we share an IRQ source ID there's a window where we can have two oadns
registered for the same GSI.  The new one will de-assert and notify
while the old one has an empty list to notify, but still de-asserts.  We
can therefore de-assert w/o notify.

By using a new source ID, we separate the two so users of the new oadn
can't race the old and we can cleanly free the old source ID,
de-asserting it.

> >  Instead each OADN
> > object gets it's own source ID, but these are all shared by users
> > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > source IDs allocated.
> > 
> > Michael had also suggested avoiding reference counting and using
> > list_empty for this OADN object.  Unfortunately, that doesn't work
> > for similar reasons.  We want to release the OADN object underlock,
> > preventing others from re-using it on the free path, but in order
> > to have lock-less de-assert & notify we use RCU, meaning we can't
> > trust list_empty until after an RCU grace period, which must be
> > done outside of spinlocks.
> 
> confused. list empty on assign/deassing would be under lock
> so no need for grace periods to trust it.
> what am I missing?
> 
> But if you like kref more that is OK too.

Maybe I'm misinterpreting this:

include/linux/rculist.h:
/**
 * list_del_rcu - deletes entry from list without re-initialization
 * @entry: the element to delete from the list.
 *
 * Note: list_empty() on entry does not return true after this,
 * the entry is in an undefined state. It is useful for RCU based
 * lockfree traversal.

If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
it again, then we can drop the kref.  Thanks,

Alex


> > If there are suggestions how we can handle these better, please
> > make them, but I think this compromise is race-free and still
> > manages to make allocation of IRQ source IDs mostly a non-issue
> > for device assignment limits.  Thanks,
> > 
> > Alex
> > 
> > ---
> > 
> > Alex Williamson (2):
> >   kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> >   kvm: Use a reserved IRQ source ID for irqfd
> > 
> > 
> >  Documentation/virtual/kvm/api.txt |   13 ++
> >  arch/x86/kvm/x86.c|4 +
> >  include/linux/kvm.h   |7 +
> >  include/linux/kvm_host.h  |2 
> >  virt/kvm/eventfd.c|  199 
> > -
> >  5 files changed, 218 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe kvm"

Re: [PATCH v9 0/2] kvm: level irqfd support

2012-08-21 Thread Michael S. Tsirkin
On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> Here's the much anticipated re-write of support for level irqfds.  As
> Michael suggested, I've rolled the eoi/ack notification fd into
> KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> be objections to associating this specifically with an EOI or an ACK,
> I've name this OADN or "On Ack, De-assert & Notify".
> 
> Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> the reason is it's racy.  Objects to track OADNs are made dynamically,
> we look through existing ones for a match under spinlock and setup a
> new one if there's no match.  On teardown, we can remove the OADN from
> the list under lock, but that same lock prevents us from de-assigning
> the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> sure that any unused GSI is de-asserted, but the above means it's
> possible that another OADN has been created for this source ID/GSI
> and de-asserting the GSI could lead to breakage.

I do not see it. What breakage? Could you give an example please?


I think what you are saying is last deassign must clear
since otherwise we never will clear.
I agree it is either that or delay deassign until ack.

Can it be as simple as this (after all rcu etc dances)?
lock irqfds
if no oadns
set level to 0
unlock irqfds
?



>  Instead each OADN
> object gets it's own source ID, but these are all shared by users
> of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> source IDs allocated.
> 
> Michael had also suggested avoiding reference counting and using
> list_empty for this OADN object.  Unfortunately, that doesn't work
> for similar reasons.  We want to release the OADN object underlock,
> preventing others from re-using it on the free path, but in order
> to have lock-less de-assert & notify we use RCU, meaning we can't
> trust list_empty until after an RCU grace period, which must be
> done outside of spinlocks.

confused. list empty on assign/deassing would be under lock
so no need for grace periods to trust it.
what am I missing?

But if you like kref more that is OK too.

> If there are suggestions how we can handle these better, please
> make them, but I think this compromise is race-free and still
> manages to make allocation of IRQ source IDs mostly a non-issue
> for device assignment limits.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   kvm: On Ack, De-assert & Notify KVM_IRQFD extension
>   kvm: Use a reserved IRQ source ID for irqfd
> 
> 
>  Documentation/virtual/kvm/api.txt |   13 ++
>  arch/x86/kvm/x86.c|4 +
>  include/linux/kvm.h   |7 +
>  include/linux/kvm_host.h  |2 
>  virt/kvm/eventfd.c|  199 
> -
>  5 files changed, 218 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html