Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-12 Thread Davide Libenzi
On Mon, 11 May 2009, Gregory Haskins wrote:

> Davide Libenzi wrote:
> > On Wed, 6 May 2009, Gregory Haskins wrote:
> >
> >   
> >> If there isn't any more feedback on the series from Al, Avi,
> >> etc...please formally submit your eventfd patch so this series is
> >> available for Avi to pull in for inclusion when/if he deems it fit.
> >> 
> >
> > Did you decide you will be using those bits?
> >   
> 
> Hi Davide,
>   It appears that we will not end up needing them since the new version
> I am about to push goes back to creating the eventfd in userspace to
> begin with, per Avi's request.

That looks like a good idea.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-11 Thread Gregory Haskins
Davide Libenzi wrote:
> On Wed, 6 May 2009, Gregory Haskins wrote:
>
>   
>> If there isn't any more feedback on the series from Al, Avi,
>> etc...please formally submit your eventfd patch so this series is
>> available for Avi to pull in for inclusion when/if he deems it fit.
>> 
>
> Did you decide you will be using those bits?
>   

Hi Davide,
  It appears that we will not end up needing them since the new version
I am about to push goes back to creating the eventfd in userspace to
begin with, per Avi's request.  Regardless, thank you kindly for your
help here, and also for your poll suggestion which we will definitely be
using.

Regards,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-08 Thread Davide Libenzi
On Wed, 6 May 2009, Gregory Haskins wrote:

> If there isn't any more feedback on the series from Al, Avi,
> etc...please formally submit your eventfd patch so this series is
> available for Avi to pull in for inclusion when/if he deems it fit.

Did you decide you will be using those bits?


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-08 Thread Avi Kivity

Davide Libenzi wrote:

On Thu, 7 May 2009, Avi Kivity wrote:

  

What's your take on adding irq context safe callbacks to irqfd?

To give some background here, we would like to use eventfd as a generic
connector between components, so the components do not know about each other.
So far eventfd successfully abstracts among components in the same process, in
different processes, and in the kernel.

eventfd_signal() can be safely called from irq context, and will wake up a
waiting task.  But in some cases, if the consumer is in the kernel, it may be
able to consume the event from irq context, saving a context switch.

So, will you consider patches adding this capability to eventfd?



Since I received this one after your ACK on the capability of eventfd of 
triggering callbacks, I assume we're clear on this point, aren't we?
  


We are.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Davide Libenzi
On Thu, 7 May 2009, Avi Kivity wrote:

> What's your take on adding irq context safe callbacks to irqfd?
> 
> To give some background here, we would like to use eventfd as a generic
> connector between components, so the components do not know about each other.
> So far eventfd successfully abstracts among components in the same process, in
> different processes, and in the kernel.
> 
> eventfd_signal() can be safely called from irq context, and will wake up a
> waiting task.  But in some cases, if the consumer is in the kernel, it may be
> able to consume the event from irq context, saving a context switch.
> 
> So, will you consider patches adding this capability to eventfd?

Since I received this one after your ACK on the capability of eventfd of 
triggering callbacks, I assume we're clear on this point, aren't we?


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Avi Kivity

Davide Libenzi wrote:

On Thu, 7 May 2009, Avi Kivity wrote:

  

Davide Libenzi wrote:

  
  

What's your take on adding irq context safe callbacks to irqfd?

To give some background here, we would like to use eventfd as a generic
connector between components, so the components do not know about each
other.
So far eventfd successfully abstracts among components in the same
process, in
different processes, and in the kernel.

eventfd_signal() can be safely called from irq context, and will wake up a
waiting task.  But in some cases, if the consumer is in the kernel, it may
be
able to consume the event from irq context, saving a context switch.

So, will you consider patches adding this capability to eventfd?



Maybe I got lost in the thread, but inside the kernel we have callback-based
wakeup since long time. This is what epoll uses, when hooking into the file*
f_op->poll() subsystem.
Did you mean something else?
  
  

Do you mean wait_queue_t::func?



Yes, it is since long time ago that a wakeup in Linux can lead either to a 
real wakeup (in scheduler terms), or to a simple function call.

Isn't that enough?

  


It's perfect.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Davide Libenzi
On Thu, 7 May 2009, Avi Kivity wrote:

> Davide Libenzi wrote:
> >   
> > > What's your take on adding irq context safe callbacks to irqfd?
> > > 
> > > To give some background here, we would like to use eventfd as a generic
> > > connector between components, so the components do not know about each
> > > other.
> > > So far eventfd successfully abstracts among components in the same
> > > process, in
> > > different processes, and in the kernel.
> > > 
> > > eventfd_signal() can be safely called from irq context, and will wake up a
> > > waiting task.  But in some cases, if the consumer is in the kernel, it may
> > > be
> > > able to consume the event from irq context, saving a context switch.
> > > 
> > > So, will you consider patches adding this capability to eventfd?
> > > 
> > 
> > Maybe I got lost in the thread, but inside the kernel we have callback-based
> > wakeup since long time. This is what epoll uses, when hooking into the file*
> > f_op->poll() subsystem.
> > Did you mean something else?
> >   
> 
> Do you mean wait_queue_t::func?

Yes, it is since long time ago that a wakeup in Linux can lead either to a 
real wakeup (in scheduler terms), or to a simple function call.
Isn't that enough?


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Avi Kivity

Davide Libenzi wrote:
  

What's your take on adding irq context safe callbacks to irqfd?

To give some background here, we would like to use eventfd as a generic
connector between components, so the components do not know about each other.
So far eventfd successfully abstracts among components in the same process, in
different processes, and in the kernel.

eventfd_signal() can be safely called from irq context, and will wake up a
waiting task.  But in some cases, if the consumer is in the kernel, it may be
able to consume the event from irq context, saving a context switch.

So, will you consider patches adding this capability to eventfd?



Maybe I got lost in the thread, but inside the kernel we have 
callback-based wakeup since long time. This is what epoll uses, when 
hooking into the file* f_op->poll() subsystem.

Did you mean something else?
  


Do you mean wait_queue_t::func?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Avi Kivity

Gregory Haskins wrote:

 


This is my preferred option.  For a virtio-net-server in the kernel,
we'd service its eventfd in qemu, raising and lowering the pci
interrupt in the traditional way.

But we'd still need to know when to lower the interrupt.  How?



IIUC, isn't that  usually device/subsystem specific, and out of scope of
the GSI delivery vehicle?  For instance, most devices I have seen with
level ints have a register in their device register namespace for acking
the int.  


Yes it is.


As an aside, this is what causes some of the grief in dealing
with shared interrupts like KVM pass-through and/or threaded-isrs:  
There isn't a standardized way to ACK them.
  


So we'd need a side channel to tell userspace to lower the irq.  Another 
eventfd likely.


Note we don't support device assignment for devices with shared interrupts.


You may also see some generalization of masking/acking in things like
the MSI-X table.  But again, this would be out of scope of the general
GSI delivery path IIUC.

I understand that there is a feedback mechanism in the ioapic model for
calling back on acknowledgment of the interrupt.  But I am not sure what
is how the real hardware works normally, and therefore I am not
convinced that is something we need to feed all the way back (i.e. via
irqfd or whatever).  In the interest of full disclosure, its been a few
years since I studied the xAPIC docs, so I might be out to lunch on that
assertion. ;)
  


Right, that ack thing is completely internal, used for catching up on 
time drift, and for shutting down level triggered assigned interrupts.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>> One thing I was thinking here was that I could create a flag for the
>> kvm_irqfd() function for something like "KVM_IRQFD_MODE_CLEAR".  This
>> flag when specified at creation time will cause the event to execute a
>> clear operation instead of a set when triggered.  That way, the default
>> mode is an edge-triggered set.  The non-default mode is to trigger a
>> clear.  Level-triggered ints could therefore create two irqfds, one for
>> raising, the other for clearing.
>>   
>
> That's my second choice option.
>
>> An alternative is to abandon the use of eventfd, and allow the irqfd to
>> be a first-class anon-fd.  The parameters passed to the write/signal()
>> function could then indicate the desired level.  The disadvantage would
>> be that it would not be compatible with eventfd, so we would need to
>> decide if the tradeoff is worth it.
>>   
>
> I would really like to keep using eventfd.  Which is why I asked
> Davide about the prospects of direct callbacks (vs wakeups).

I saw that request.  That would be ideal.

>
>> OTOH, I suspect level triggered interrupts will be primarily in the
>> legacy domain, so perhaps we do not need to worry about it too much.
>> Therefore, another option is that we *could* simply set the stake in the
>> ground that legacy/level cannot use irqfd.
>>   
>
> This is my preferred option.  For a virtio-net-server in the kernel,
> we'd service its eventfd in qemu, raising and lowering the pci
> interrupt in the traditional way.
>
> But we'd still need to know when to lower the interrupt.  How?

IIUC, isn't that  usually device/subsystem specific, and out of scope of
the GSI delivery vehicle?  For instance, most devices I have seen with
level ints have a register in their device register namespace for acking
the int.  As an aside, this is what causes some of the grief in dealing
with shared interrupts like KVM pass-through and/or threaded-isrs:  
There isn't a standardized way to ACK them.

You may also see some generalization of masking/acking in things like
the MSI-X table.  But again, this would be out of scope of the general
GSI delivery path IIUC.

I understand that there is a feedback mechanism in the ioapic model for
calling back on acknowledgment of the interrupt.  But I am not sure what
is how the real hardware works normally, and therefore I am not
convinced that is something we need to feed all the way back (i.e. via
irqfd or whatever).  In the interest of full disclosure, its been a few
years since I studied the xAPIC docs, so I might be out to lunch on that
assertion. ;)

-Greg





signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Davide Libenzi
On Thu, 7 May 2009, Avi Kivity wrote:

> What's your take on adding irq context safe callbacks to irqfd?
> 
> To give some background here, we would like to use eventfd as a generic
> connector between components, so the components do not know about each other.
> So far eventfd successfully abstracts among components in the same process, in
> different processes, and in the kernel.
> 
> eventfd_signal() can be safely called from irq context, and will wake up a
> waiting task.  But in some cases, if the consumer is in the kernel, it may be
> able to consume the event from irq context, saving a context switch.
> 
> So, will you consider patches adding this capability to eventfd?

Maybe I got lost in the thread, but inside the kernel we have 
callback-based wakeup since long time. This is what epoll uses, when 
hooking into the file* f_op->poll() subsystem.
Did you mean something else?


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Avi Kivity

Gregory Haskins wrote:

One thing I was thinking here was that I could create a flag for the
kvm_irqfd() function for something like "KVM_IRQFD_MODE_CLEAR".  This
flag when specified at creation time will cause the event to execute a
clear operation instead of a set when triggered.  That way, the default
mode is an edge-triggered set.  The non-default mode is to trigger a
clear.  Level-triggered ints could therefore create two irqfds, one for
raising, the other for clearing.
  


That's my second choice option.


An alternative is to abandon the use of eventfd, and allow the irqfd to
be a first-class anon-fd.  The parameters passed to the write/signal()
function could then indicate the desired level.  The disadvantage would
be that it would not be compatible with eventfd, so we would need to
decide if the tradeoff is worth it.
  


I would really like to keep using eventfd.  Which is why I asked Davide 
about the prospects of direct callbacks (vs wakeups).



OTOH, I suspect level triggered interrupts will be primarily in the
legacy domain, so perhaps we do not need to worry about it too much. 
Therefore, another option is that we *could* simply set the stake in the

ground that legacy/level cannot use irqfd.
  


This is my preferred option.  For a virtio-net-server in the kernel, 
we'd service its eventfd in qemu, raising and lowering the pci interrupt 
in the traditional way.


But we'd still need to know when to lower the interrupt.  How?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Gregory Haskins
Marcelo Tosatti wrote:
> On Thu, May 07, 2009 at 12:48:21PM +0300, Avi Kivity wrote:
>   
>> Davide Libenzi wrote:
>> 
>>> On Wed, 6 May 2009, Gregory Haskins wrote:
>>>
>>>   
>>>   
 I think we are ok in this regard (at least in v5) without the 
 callback. kvm holds irqfd, which holds eventfd.  In a normal 
 situation, we will
 have eventfd with 2 references.  If userspace closes the eventfd, it
 will drop 1 of the 2 eventfd file references, but the object should
 remain intact as long as kvm still holds it as well.  When the kvm-fd is
 released, we will then decouple from the eventfd->wqh and drop the last
 fput(), officially freeing it.

 Likewise, if kvm is closed before the eventfd, we will simply decouple
 from the wqh and fput(eventfd), leaving the last reference held by
 userspace until it closes as well.

 Let me know if you see any holes in that.
 
 
>>> Looks OK, modulo my knowledge of KVM internals.
>>>   
>>>   
>> What's your take on adding irq context safe callbacks to irqfd?
>>
>> To give some background here, we would like to use eventfd as a generic  
>> connector between components, so the components do not know about each  
>> other.  So far eventfd successfully abstracts among components in the  
>> same process, in different processes, and in the kernel.
>>
>> eventfd_signal() can be safely called from irq context, and will wake up  
>> a waiting task.  But in some cases, if the consumer is in the kernel, it  
>> may be able to consume the event from irq context, saving a context 
>> switch.
>>
>> So, will you consider patches adding this capability to eventfd?
>> 
>
> (pasting from a separate thread)
>
>   
>> That's my thinking.  PCI interrupts don't work because we need to do  
>> some hacky stuff in there, but MSI should.  Oh, and we could improve
>> UIO  
>> support for interrupts when using MSI, since there's no need to  
>> acknowledge the interrupt.
>> 
>
> Ok, so for INTx assigned devices all you need to do on the ACK handler
> is to re-enable the host interrupt (and set the guest interrupt line to
> low).
>
> Right now the ack comes through a kvm internal irq ack callback.
>
> AFAICS there is no mechanism in irqfd for ACK notification, and
> interrupt injection is edge triggered.
>
> So for PCI INTx assigned devices (or any INTx level), you'd want to keep
> the guest interrupt high, with some way to notify the ACK.
>
> Avi mentioned a separate irqfd to notify the ACK. For assigned devices,
> you could register a fd wakeup function in that fd, which replaces the
> current irq ACK callback?
>   

One thing I was thinking here was that I could create a flag for the
kvm_irqfd() function for something like "KVM_IRQFD_MODE_CLEAR".  This
flag when specified at creation time will cause the event to execute a
clear operation instead of a set when triggered.  That way, the default
mode is an edge-triggered set.  The non-default mode is to trigger a
clear.  Level-triggered ints could therefore create two irqfds, one for
raising, the other for clearing.

An alternative is to abandon the use of eventfd, and allow the irqfd to
be a first-class anon-fd.  The parameters passed to the write/signal()
function could then indicate the desired level.  The disadvantage would
be that it would not be compatible with eventfd, so we would need to
decide if the tradeoff is worth it.

OTOH, I suspect level triggered interrupts will be primarily in the
legacy domain, so perhaps we do not need to worry about it too much. 
Therefore, another option is that we *could* simply set the stake in the
ground that legacy/level cannot use irqfd.

Thoughts?

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Marcelo Tosatti
On Thu, May 07, 2009 at 12:48:21PM +0300, Avi Kivity wrote:
> Davide Libenzi wrote:
>> On Wed, 6 May 2009, Gregory Haskins wrote:
>>
>>   
>>> I think we are ok in this regard (at least in v5) without the 
>>> callback. kvm holds irqfd, which holds eventfd.  In a normal 
>>> situation, we will
>>> have eventfd with 2 references.  If userspace closes the eventfd, it
>>> will drop 1 of the 2 eventfd file references, but the object should
>>> remain intact as long as kvm still holds it as well.  When the kvm-fd is
>>> released, we will then decouple from the eventfd->wqh and drop the last
>>> fput(), officially freeing it.
>>>
>>> Likewise, if kvm is closed before the eventfd, we will simply decouple
>>> from the wqh and fput(eventfd), leaving the last reference held by
>>> userspace until it closes as well.
>>>
>>> Let me know if you see any holes in that.
>>> 
>>
>> Looks OK, modulo my knowledge of KVM internals.
>>   
>
> What's your take on adding irq context safe callbacks to irqfd?
>
> To give some background here, we would like to use eventfd as a generic  
> connector between components, so the components do not know about each  
> other.  So far eventfd successfully abstracts among components in the  
> same process, in different processes, and in the kernel.
>
> eventfd_signal() can be safely called from irq context, and will wake up  
> a waiting task.  But in some cases, if the consumer is in the kernel, it  
> may be able to consume the event from irq context, saving a context 
> switch.
>
> So, will you consider patches adding this capability to eventfd?

(pasting from a separate thread)

> That's my thinking.  PCI interrupts don't work because we need to do  
> some hacky stuff in there, but MSI should.  Oh, and we could improve
> UIO  
> support for interrupts when using MSI, since there's no need to  
> acknowledge the interrupt.

Ok, so for INTx assigned devices all you need to do on the ACK handler
is to re-enable the host interrupt (and set the guest interrupt line to
low).

Right now the ack comes through a kvm internal irq ack callback.

AFAICS there is no mechanism in irqfd for ACK notification, and
interrupt injection is edge triggered.

So for PCI INTx assigned devices (or any INTx level), you'd want to keep
the guest interrupt high, with some way to notify the ACK.

Avi mentioned a separate irqfd to notify the ACK. For assigned devices,
you could register a fd wakeup function in that fd, which replaces the
current irq ACK callback?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-07 Thread Avi Kivity

Davide Libenzi wrote:

On Wed, 6 May 2009, Gregory Haskins wrote:

  
I think we are ok in this regard (at least in v5) without the callback. 
kvm holds irqfd, which holds eventfd.  In a normal situation, we will

have eventfd with 2 references.  If userspace closes the eventfd, it
will drop 1 of the 2 eventfd file references, but the object should
remain intact as long as kvm still holds it as well.  When the kvm-fd is
released, we will then decouple from the eventfd->wqh and drop the last
fput(), officially freeing it.

Likewise, if kvm is closed before the eventfd, we will simply decouple
from the wqh and fput(eventfd), leaving the last reference held by
userspace until it closes as well.

Let me know if you see any holes in that.



Looks OK, modulo my knowledge of KVM internals.
  


What's your take on adding irq context safe callbacks to irqfd?

To give some background here, we would like to use eventfd as a generic 
connector between components, so the components do not know about each 
other.  So far eventfd successfully abstracts among components in the 
same process, in different processes, and in the kernel.


eventfd_signal() can be safely called from irq context, and will wake up 
a waiting task.  But in some cases, if the consumer is in the kernel, it 
may be able to consume the event from irq context, saving a context switch.


So, will you consider patches adding this capability to eventfd?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-06 Thread Gregory Haskins
Davide Libenzi wrote:
> On Wed, 6 May 2009, Gregory Haskins wrote:
>
>   
>> I think we are ok in this regard (at least in v5) without the callback. 
>> kvm holds irqfd, which holds eventfd.  In a normal situation, we will
>> have eventfd with 2 references.  If userspace closes the eventfd, it
>> will drop 1 of the 2 eventfd file references, but the object should
>> remain intact as long as kvm still holds it as well.  When the kvm-fd is
>> released, we will then decouple from the eventfd->wqh and drop the last
>> fput(), officially freeing it.
>>
>> Likewise, if kvm is closed before the eventfd, we will simply decouple
>> from the wqh and fput(eventfd), leaving the last reference held by
>> userspace until it closes as well.
>>
>> Let me know if you see any holes in that.
>> 
>
> Looks OK, modulo my knowledge of KVM internals.
>   

Thanks Davide,

If there isn't any more feedback on the series from Al, Avi,
etc...please formally submit your eventfd patch so this series is
available for Avi to pull in for inclusion when/if he deems it fit.

Thanks again,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-06 Thread Davide Libenzi
On Wed, 6 May 2009, Gregory Haskins wrote:

> I think we are ok in this regard (at least in v5) without the callback. 
> kvm holds irqfd, which holds eventfd.  In a normal situation, we will
> have eventfd with 2 references.  If userspace closes the eventfd, it
> will drop 1 of the 2 eventfd file references, but the object should
> remain intact as long as kvm still holds it as well.  When the kvm-fd is
> released, we will then decouple from the eventfd->wqh and drop the last
> fput(), officially freeing it.
> 
> Likewise, if kvm is closed before the eventfd, we will simply decouple
> from the wqh and fput(eventfd), leaving the last reference held by
> userspace until it closes as well.
> 
> Let me know if you see any holes in that.

Looks OK, modulo my knowledge of KVM internals.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-06 Thread Gregory Haskins
Davide Libenzi wrote:
> On Wed, 6 May 2009, Gregory Haskins wrote:
>
>   
>> Al, Davide,
>>
>> Gregory Haskins wrote:
>> 
>>> +
>>> +int
>>> +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
>>> +{
>>> +   struct _irqfd *irqfd;
>>> +   struct file *file = NULL;
>>> +   int fd = -1;
>>> +   int ret;
>>> +
>>> +   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>>> +   if (!irqfd)
>>> +   return -ENOMEM;
>>> +
>>> +   irqfd->kvm = kvm;
>>> +   irqfd->gsi = gsi;
>>> +   INIT_LIST_HEAD(&irqfd->list);
>>> +   INIT_WORK(&irqfd->work, irqfd_inject);
>>> +
>>> +   /*
>>> +* We re-use eventfd for irqfd, and therefore will embed the eventfd
>>> +* lifetime in the irqfd.
>>> +*/
>>> +   file = eventfd_file_create(0, 0);
>>> +   if (IS_ERR(file)) {
>>> +   ret = PTR_ERR(file);
>>> +   goto fail;
>>> +   }
>>> +
>>> +   irqfd->file = file;
>>> +
>>> +   /*
>>> +* Install our own custom wake-up handling so we are notified via
>>> +* a callback whenever someone signals the underlying eventfd
>>> +*/
>>> +   init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
>>> +   init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
>>> +
>>> +   ret = file->f_op->poll(file, &irqfd->pt);
>>> +   if (ret < 0)
>>> +   goto fail;
>>> +
>>> +   mutex_lock(&kvm->lock);
>>> +   list_add_tail(&irqfd->list, &kvm->irqfds);
>>> +   mutex_unlock(&kvm->lock);
>>> +
>>> +   ret = get_unused_fd();
>>> +   if (ret < 0)
>>> +   goto fail;
>>> +
>>> +   fd = ret;
>>> +
>>> +   fd_install(fd, file);
>>>   
>>>   
>> Can you comment on whether this function needs to take an additional
>> reference on file here?  (one for the one held by userspace/fd, and the
>> other for irqfd->file)  My instinct is telling me this may be currently
>> broken, but I am not sure.
>> 
>
> I don't know exactly how it is used, but looks broken.
Yeah, I think it is.  I addressed this in v5 so please review that
version if you have a moment.

>  If the fd is 
> returned to userspace, and userspace closes the fd, you will leak the 
> irqfd*. If you do an extra fget(), you will never know if the userspace 
> closed the last-1 instance of the eventfd file*.
> What is likely needed, is add a callback to eventfd_file_create(), so that 
> the eventfd can call your callback on ->release, and give you a chance to 
> perform proper cleanup of the irqfd*.
>   

I think we are ok in this regard (at least in v5) without the callback. 
kvm holds irqfd, which holds eventfd.  In a normal situation, we will
have eventfd with 2 references.  If userspace closes the eventfd, it
will drop 1 of the 2 eventfd file references, but the object should
remain intact as long as kvm still holds it as well.  When the kvm-fd is
released, we will then decouple from the eventfd->wqh and drop the last
fput(), officially freeing it.

Likewise, if kvm is closed before the eventfd, we will simply decouple
from the wqh and fput(eventfd), leaving the last reference held by
userspace until it closes as well.

Let me know if you see any holes in that.

Thanks Davide,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-06 Thread Davide Libenzi
On Wed, 6 May 2009, Gregory Haskins wrote:

> Al, Davide,
> 
> Gregory Haskins wrote:
> > +
> > +int
> > +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
> > +{
> > +   struct _irqfd *irqfd;
> > +   struct file *file = NULL;
> > +   int fd = -1;
> > +   int ret;
> > +
> > +   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > +   if (!irqfd)
> > +   return -ENOMEM;
> > +
> > +   irqfd->kvm = kvm;
> > +   irqfd->gsi = gsi;
> > +   INIT_LIST_HEAD(&irqfd->list);
> > +   INIT_WORK(&irqfd->work, irqfd_inject);
> > +
> > +   /*
> > +* We re-use eventfd for irqfd, and therefore will embed the eventfd
> > +* lifetime in the irqfd.
> > +*/
> > +   file = eventfd_file_create(0, 0);
> > +   if (IS_ERR(file)) {
> > +   ret = PTR_ERR(file);
> > +   goto fail;
> > +   }
> > +
> > +   irqfd->file = file;
> > +
> > +   /*
> > +* Install our own custom wake-up handling so we are notified via
> > +* a callback whenever someone signals the underlying eventfd
> > +*/
> > +   init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> > +   init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> > +
> > +   ret = file->f_op->poll(file, &irqfd->pt);
> > +   if (ret < 0)
> > +   goto fail;
> > +
> > +   mutex_lock(&kvm->lock);
> > +   list_add_tail(&irqfd->list, &kvm->irqfds);
> > +   mutex_unlock(&kvm->lock);
> > +
> > +   ret = get_unused_fd();
> > +   if (ret < 0)
> > +   goto fail;
> > +
> > +   fd = ret;
> > +
> > +   fd_install(fd, file);
> >   
> 
> Can you comment on whether this function needs to take an additional
> reference on file here?  (one for the one held by userspace/fd, and the
> other for irqfd->file)  My instinct is telling me this may be currently
> broken, but I am not sure.

I don't know exactly how it is used, but looks broken. If the fd is 
returned to userspace, and userspace closes the fd, you will leak the 
irqfd*. If you do an extra fget(), you will never know if the userspace 
closed the last-1 instance of the eventfd file*.
What is likely needed, is add a callback to eventfd_file_create(), so that 
the eventfd can call your callback on ->release, and give you a chance to 
perform proper cleanup of the irqfd*.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-06 Thread Gregory Haskins
Al, Davide,

Gregory Haskins wrote:
> +
> +int
> +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
> +{
> + struct _irqfd *irqfd;
> + struct file *file = NULL;
> + int fd = -1;
> + int ret;
> +
> + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> + if (!irqfd)
> + return -ENOMEM;
> +
> + irqfd->kvm = kvm;
> + irqfd->gsi = gsi;
> + INIT_LIST_HEAD(&irqfd->list);
> + INIT_WORK(&irqfd->work, irqfd_inject);
> +
> + /*
> +  * We re-use eventfd for irqfd, and therefore will embed the eventfd
> +  * lifetime in the irqfd.
> +  */
> + file = eventfd_file_create(0, 0);
> + if (IS_ERR(file)) {
> + ret = PTR_ERR(file);
> + goto fail;
> + }
> +
> + irqfd->file = file;
> +
> + /*
> +  * Install our own custom wake-up handling so we are notified via
> +  * a callback whenever someone signals the underlying eventfd
> +  */
> + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> + init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> +
> + ret = file->f_op->poll(file, &irqfd->pt);
> + if (ret < 0)
> + goto fail;
> +
> + mutex_lock(&kvm->lock);
> + list_add_tail(&irqfd->list, &kvm->irqfds);
> + mutex_unlock(&kvm->lock);
> +
> + ret = get_unused_fd();
> + if (ret < 0)
> + goto fail;
> +
> + fd = ret;
> +
> + fd_install(fd, file);
>   

Can you comment on whether this function needs to take an additional
reference on file here?  (one for the one held by userspace/fd, and the
other for irqfd->file)  My instinct is telling me this may be currently
broken, but I am not sure.

-Greg





signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>> Avi Kivity wrote:
>>  
>>> Gregory Haskins wrote:
>>>
>>>
 +int
 +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
 +{
 +struct _irqfd *irqfd;
 +struct file *file = NULL;
 +int fd = -1;
 +int ret;
 +
 +irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
 +if (!irqfd)
 +return -ENOMEM;
 +
 +irqfd->kvm = kvm;
 
>>> You need to increase the refcount on struct kvm here.  Otherwise evil
>>> userspace will create an irqfd, close the vm and vcpu fds, and inject
>>> an interrupt.
>>> 
>>
>> I just reviewed the code in prep for v5, and now I remember why I didnt
>> take a reference:  I designed it the opposite direction:  the vm-fd owns
>> a reference to the irqfd, and will decouple the kvm context from the
>> eventfd on shutdown  (see kvm_irqfd_release()).   I still need to spin a
>> v5 regardless in order to add the padding as previously discussed.  But
>> let me know if you still see any holes in light of this alternate object
>> lifetime approach I am using.
>>   
>
> Right, irqfd_release works.  But I think refcounting is simpler, since
> we already kvm_get_kvm() and kvm_put_kvm(), and you wouldn't need the
> irqfd list.  On the other hand, I'm not sure you get a callback from
> eventfd on close(), so refcounting may not be implementable.

;)

>
> Drat, irqfd_release doesn't work.  You reference kvm->lock in
> irqfd_inject without taking any locks.

I *think* this is ok, tho.  I remove myself from the waitq, and then
flush any potentially scheduled deferred work before returning.  This
all happens synchronously to the vm_release() code when the vm-fd is
bring dropped, but before we actually release the struct kvm*. 
Therefore, I think kvm->lock is guaranteed to remain valid for the
duration of the irqfd_release(), and we guarantee it wont be accessed
after the irqfd_release() completes.  Or do you have a different concern?

On this topic of proper ref counts, though

I wonder if I need an extra fget() in there.  I presume that the
evenfd_file_create() returns with only a single reference, which
presumably I am handing one to userspace, and one to the irqfd which
is broken.  Or does fd_install() bump that for me (doesnt look like
it)?  Al, Davide, any comments?

>
> btw, there's still your original idea of creating the eventfd in
> userspace and passing it down.  That would be workable if we can see a
> way to both signal the eventfd and get called back in irq context. 
> Maybe that's preferable to what we're doing here, but we need to see
> how it would work.

We can do that, but I don't see it as changing the general problem
here.  However, I think if you find that the above comments about the
kvm->lock w.r.t. irqfd_release() are ok, we don't need to worry about
it.  If you prefer the userspace allocation of eventfd() for other
reasons, we can easily go back to that model as well...but its not
strictly necessary for this particular issue iiuc.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Avi Kivity

Gregory Haskins wrote:

Avi Kivity wrote:
  

Gregory Haskins wrote:



+int
+kvm_irqfd(struct kvm *kvm, int gsi, int flags)
+{
+struct _irqfd *irqfd;
+struct file *file = NULL;
+int fd = -1;
+int ret;
+
+irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+if (!irqfd)
+return -ENOMEM;
+
+irqfd->kvm = kvm;
  
  

You need to increase the refcount on struct kvm here.  Otherwise evil
userspace will create an irqfd, close the vm and vcpu fds, and inject
an interrupt.



I just reviewed the code in prep for v5, and now I remember why I didnt
take a reference:  I designed it the opposite direction:  the vm-fd owns
a reference to the irqfd, and will decouple the kvm context from the
eventfd on shutdown  (see kvm_irqfd_release()).   I still need to spin a
v5 regardless in order to add the padding as previously discussed.  But
let me know if you still see any holes in light of this alternate object
lifetime approach I am using.
  


Right, irqfd_release works.  But I think refcounting is simpler, since 
we already kvm_get_kvm() and kvm_put_kvm(), and you wouldn't need the 
irqfd list.  On the other hand, I'm not sure you get a callback from 
eventfd on close(), so refcounting may not be implementable.


Drat, irqfd_release doesn't work.  You reference kvm->lock in 
irqfd_inject without taking any locks.


btw, there's still your original idea of creating the eventfd in 
userspace and passing it down.  That would be workable if we can see a 
way to both signal the eventfd and get called back in irq context.  
Maybe that's preferable to what we're doing here, but we need to see how 
it would work.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>
>> +int
>> +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
>> +{
>> +struct _irqfd *irqfd;
>> +struct file *file = NULL;
>> +int fd = -1;
>> +int ret;
>> +
>> +irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>> +if (!irqfd)
>> +return -ENOMEM;
>> +
>> +irqfd->kvm = kvm;
>>   
>
> You need to increase the refcount on struct kvm here.  Otherwise evil
> userspace will create an irqfd, close the vm and vcpu fds, and inject
> an interrupt.

I just reviewed the code in prep for v5, and now I remember why I didnt
take a reference:  I designed it the opposite direction:  the vm-fd owns
a reference to the irqfd, and will decouple the kvm context from the
eventfd on shutdown  (see kvm_irqfd_release()).   I still need to spin a
v5 regardless in order to add the padding as previously discussed.  But
let me know if you still see any holes in light of this alternate object
lifetime approach I am using.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Avi Kivity

Gregory Haskins wrote:
 
+struct kvm_irqfd {

+__u32 gsi;
+__u32 flags;
+};
+
  
  

Please add some reserved space here.



Ack.  Any rule of thumb here?  How about a "__u8 pad[16]" ?
  


I'd round it up so the whole thing is 32 bytes (not that it matters).

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>> KVM provides a complete virtual system environment for guests, including
>> support for injecting interrupts modeled after the real
>> exception/interrupt
>> facilities present on the native platform (such as the IDT on x86).
>> Virtual interrupts can come from a variety of sources (emulated devices,
>> pass-through devices, etc) but all must be injected to the guest via
>> the KVM infrastructure.  This patch adds a new mechanism to inject a
>> specific
>> interrupt to a guest using a decoupled eventfd mechnanism:  Any legal
>> signal
>> on the irqfd (using eventfd semantics from either userspace or
>> kernel) will
>> translate into an injected interrupt in the guest at the next available
>> interrupt window.
>>
>>  
>> +struct kvm_irqfd {
>> +__u32 gsi;
>> +__u32 flags;
>> +};
>> +
>>   
>
> Please add some reserved space here.

Ack.  Any rule of thumb here?  How about a "__u8 pad[16]" ?

>
>> +int
>> +kvm_irqfd(struct kvm *kvm, int gsi, int flags)
>> +{
>> +struct _irqfd *irqfd;
>> +struct file *file = NULL;
>> +int fd = -1;
>> +int ret;
>> +
>> +irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
>> +if (!irqfd)
>> +return -ENOMEM;
>> +
>> +irqfd->kvm = kvm;
>>   
>
> You need to increase the refcount on struct kvm here.  Otherwise evil
> userspace will create an irqfd, close the vm and vcpu fds, and inject
> an interrupt.

Good catch.  Will fix.

Thanks Avi,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v4 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-05 Thread Avi Kivity

Gregory Haskins wrote:

KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure.  This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.

 
+struct kvm_irqfd {

+   __u32 gsi;
+   __u32 flags;
+};
+
  


Please add some reserved space here.


+int
+kvm_irqfd(struct kvm *kvm, int gsi, int flags)
+{
+   struct _irqfd *irqfd;
+   struct file *file = NULL;
+   int fd = -1;
+   int ret;
+
+   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+   if (!irqfd)
+   return -ENOMEM;
+
+   irqfd->kvm = kvm;
  


You need to increase the refcount on struct kvm here.  Otherwise evil 
userspace will create an irqfd, close the vm and vcpu fds, and inject an 
interrupt.


Otherwise, looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html