Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-20 Thread Avi Kivity
Davide Libenzi wrote:
> On Thu, 17 May 2007, Avi Kivity wrote:
>
>   
>> Davide Libenzi wrote:
>> 
>>> On Wed, 16 May 2007, Avi Kivity wrote:
>>>
>>>   
>>>   
 IMO doing eventfd_fget() asap is best.  I much prefer refcounted pointers
 to
 handles in the kernel: it's easier to see what things point to, and there
 is
 to context needed for dereferencing.
 
 
>>> There are concerns (from Al and Christoph) about file lifetime tracking of
>>> an fd passed down to the kernel.
>>>   
>>>   
>> What concerns are these?
>>
>> I have my own concerns about bare fds, which can change their backing file,
>> and which are dependent on the context.  The kernel appears to be moving away
>> from handles to pointers, as seen by the pid -> task_struck/struct pid
>> conversion.
>> 
>
> The concern is that by keeping a reference to the file* you may end up 
> with a file that has no more userspace links. Although the file_count()==1 
> will tell you that you've the only reference left, this relies on you 
> doing the check. I really don't know how this will be used in KVM, so I 
> can't really say how well these concerns applies.
>
>   

That's perfectly fine for kvm (and as far as I can see, any other 
application: the user losing their eventfd reference is equivalent to 
the user ignoring any events, which should be survivable by the kernel).

>
>
>   
>>> Avi, how about you go the other way around? I expose you something like:
>>>
>>> long eventfd_create(unsigned int count, void (*release)(void *),
>>> void *priv);
>>>
>>> This returns you an eventfd (or error), and lets you install a callback to
>>> be used when the file_operations->release is called (basically, userspace
>>> closed the last file instance).
>>> Can KVM pass back an fd to userspace instead of the other way around?
>>>   
>>>   
>> That was my original thought, but when I saw your proposal I preferred that 
>> as
>> more flexible.
>>
>> If we go this way, I prefer to have a struct file * and do away with the
>> callback, but that brings us back to the file lifetime concerns.
>> 
>
> Doing the above requires some care too. Access to your copy of the file* 
> must be protected by a lock (you can sleep from inside the callback, so a 
> mutex is fine too). I'll sketch you some code:
>
>
> /* Defined in linux/eventfd.h */
> struct eventfd_relcb {
>   struct list_head lnk;
>   void (*proc)(struct eventfd_relcb *);
> };
>
> /* Your context data */
> struct your_ctx {
>   whatever_lock your_lock;
>   ...
>   struct eventfd_relcb rcb;
>   struct file *evfile;
> };
>
> /* Your eventfd release callback */
> void rcb_callback(struct eventfd_relcb *rcb) {
>   struct your_ctx *c = container_of(rcb, struct your_ctx, rcb);
>
>   whatever_lock_lock(&c->your_lock);
>   ...
>   c->evfile = NULL;
>   whatever_lock_unlock(&c->your_lock);
> }
>
> /* Your notify userspace function */
> void notify_userspace(struct your_ctx *c) {
>
>   whatever_lock_lock(&c->your_lock);
>   if (c->evfile != NULL)
>   eventfd_signal(c->evfile, 1);
>   whatever_lock_unlock(&c->your_lock);
> }
>
> /* Your eventfd create/setup function (modulo error checks) */
> void setup_eventfd(struct your_ctx *c) {
>   int fd;
>
>   c->rcb.proc = rcb_callback;
>   fd = eventfd_create(0, &c->rcb);
>   c->evfile = eventfd_fget(fd);
>   /* Then return fd to userspace in some way */
>   ...
> }
>
>
>
> If, by any chance, you need to detach yourself from the eventfd:
>
> void detach_eventfd(struct your_ctx *c) {
>
>   whatever_lock_lock(&c->your_lock);
>   if (c->evfile != NULL) {
>   eventfd_del_release_cb(c->evfile, &c->rcb);
>   /* And, optionally ... */
>   eventfd_set_shutdown(c->evfile);
>   }
>   whatever_lock_unlock(&c->your_lock);
> }
>
>
> The eventfd_set_shutdown() function will make a POLLIN signaled to 
> userspace, and a read(2) on the eventfd will return 0 (like peer 
> disconnected sockets). Then userspace will know that no more eventfs will 
> show up in there.
> Tentative patch below (builds, not tested yet).
>
>
>   

Looks like a lot of code for what was supposed to be a code reduction... 
I think I'll look at Gregory's notification-free suggestion instead.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-17 Thread Davide Libenzi
On Thu, 17 May 2007, Davide Libenzi wrote:

> /* Your eventfd create/setup function (modulo error checks) */
> void setup_eventfd(struct your_ctx *c) {
>   int fd;
> 
>   c->rcb.proc = rcb_callback;
>   fd = eventfd_create(0, &c->rcb);
>   c->evfile = eventfd_fget(fd);

+   fput(c->evfile);

>   /* Then return fd to userspace in some way */
>   ...
> }

You need an fput() there, to bring f_count back to 1, of course ;)


- Davide



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-17 Thread Davide Libenzi
On Thu, 17 May 2007, Avi Kivity wrote:

> Davide Libenzi wrote:
> > On Wed, 16 May 2007, Avi Kivity wrote:
> > 
> >   
> > > IMO doing eventfd_fget() asap is best.  I much prefer refcounted pointers
> > > to
> > > handles in the kernel: it's easier to see what things point to, and there
> > > is
> > > to context needed for dereferencing.
> > > 
> > 
> > There are concerns (from Al and Christoph) about file lifetime tracking of
> > an fd passed down to the kernel.
> >   
> 
> What concerns are these?
> 
> I have my own concerns about bare fds, which can change their backing file,
> and which are dependent on the context.  The kernel appears to be moving away
> from handles to pointers, as seen by the pid -> task_struck/struct pid
> conversion.

The concern is that by keeping a reference to the file* you may end up 
with a file that has no more userspace links. Although the file_count()==1 
will tell you that you've the only reference left, this relies on you 
doing the check. I really don't know how this will be used in KVM, so I 
can't really say how well these concerns applies.




> > Avi, how about you go the other way around? I expose you something like:
> > 
> > long eventfd_create(unsigned int count, void (*release)(void *),
> > void *priv);
> > 
> > This returns you an eventfd (or error), and lets you install a callback to
> > be used when the file_operations->release is called (basically, userspace
> > closed the last file instance).
> > Can KVM pass back an fd to userspace instead of the other way around?
> >   
> 
> That was my original thought, but when I saw your proposal I preferred that as
> more flexible.
> 
> If we go this way, I prefer to have a struct file * and do away with the
> callback, but that brings us back to the file lifetime concerns.

Doing the above requires some care too. Access to your copy of the file* 
must be protected by a lock (you can sleep from inside the callback, so a 
mutex is fine too). I'll sketch you some code:


/* Defined in linux/eventfd.h */
struct eventfd_relcb {
struct list_head lnk;
void (*proc)(struct eventfd_relcb *);
};

/* Your context data */
struct your_ctx {
whatever_lock your_lock;
...
struct eventfd_relcb rcb;
struct file *evfile;
};

/* Your eventfd release callback */
void rcb_callback(struct eventfd_relcb *rcb) {
struct your_ctx *c = container_of(rcb, struct your_ctx, rcb);

whatever_lock_lock(&c->your_lock);
...
c->evfile = NULL;
whatever_lock_unlock(&c->your_lock);
}

/* Your notify userspace function */
void notify_userspace(struct your_ctx *c) {

whatever_lock_lock(&c->your_lock);
if (c->evfile != NULL)
eventfd_signal(c->evfile, 1);
whatever_lock_unlock(&c->your_lock);
}

/* Your eventfd create/setup function (modulo error checks) */
void setup_eventfd(struct your_ctx *c) {
int fd;

c->rcb.proc = rcb_callback;
fd = eventfd_create(0, &c->rcb);
c->evfile = eventfd_fget(fd);
/* Then return fd to userspace in some way */
...
}



If, by any chance, you need to detach yourself from the eventfd:

void detach_eventfd(struct your_ctx *c) {

whatever_lock_lock(&c->your_lock);
if (c->evfile != NULL) {
eventfd_del_release_cb(c->evfile, &c->rcb);
/* And, optionally ... */
eventfd_set_shutdown(c->evfile);
}
whatever_lock_unlock(&c->your_lock);
}


The eventfd_set_shutdown() function will make a POLLIN signaled to 
userspace, and a read(2) on the eventfd will return 0 (like peer 
disconnected sockets). Then userspace will know that no more eventfs will 
show up in there.
Tentative patch below (builds, not tested yet).




- Davide



Index: linux-2.6.22-rc1-git1.mod/fs/eventfd.c
===
--- linux-2.6.22-rc1-git1.mod.orig/fs/eventfd.c 2007-05-16 15:10:26.0 
-0700
+++ linux-2.6.22-rc1-git1.mod/fs/eventfd.c  2007-05-17 10:53:38.0 
-0700
@@ -16,9 +16,15 @@
 #include 
 #include 
 
+#define EVENTFD_FLG_SHUTDOWN   (1 << 0)
+
+#define EVENTFD_FILE(f)((f)->f_op == &eventfd_fops)
+
 struct eventfd_ctx {
spinlock_t lock;
wait_queue_head_t wqh;
+   unsigned long flags;
+
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
@@ -28,37 +34,72 @@
 * issue a wakeup.
 */
__u64 count;
+
+   /*
+* List of callbacks to be invoked when the eventfd file
+* in going away.
+*/
+   struct list_head rcb_list;
 };
 
-/*
- * Adds "n" to the eventfd counter "count". Returns "n" in case of
- * success, or a value lower then "n" in case of coutner overflow.
- * This function is supposed to be called by the kernel in paths
- * that do not allow s

Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-17 Thread Avi Kivity
Davide Libenzi wrote:
> On Wed, 16 May 2007, Avi Kivity wrote:
>
>   
>> IMO doing eventfd_fget() asap is best.  I much prefer refcounted pointers to
>> handles in the kernel: it's easier to see what things point to, and there is
>> to context needed for dereferencing.
>> 
>
> There are concerns (from Al and Christoph) about file lifetime tracking of 
> an fd passed down to the kernel.
>   

What concerns are these?

I have my own concerns about bare fds, which can change their backing 
file, and which are dependent on the context.  The kernel appears to be 
moving away from handles to pointers, as seen by the pid -> 
task_struck/struct pid conversion.

> Avi, how about you go the other way around? I expose you something like:
>
> long eventfd_create(unsigned int count, void (*release)(void *),
> void *priv);
>
> This returns you an eventfd (or error), and lets you install a callback to 
> be used when the file_operations->release is called (basically, userspace 
> closed the last file instance).
> Can KVM pass back an fd to userspace instead of the other way around?
>   

That was my original thought, but when I saw your proposal I preferred 
that as more flexible.

If we go this way, I prefer to have a struct file * and do away with the 
callback, but that brings us back to the file lifetime concerns.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-16 Thread Davide Libenzi
On Wed, 16 May 2007, Avi Kivity wrote:

> IMO doing eventfd_fget() asap is best.  I much prefer refcounted pointers to
> handles in the kernel: it's easier to see what things point to, and there is
> to context needed for dereferencing.

There are concerns (from Al and Christoph) about file lifetime tracking of 
an fd passed down to the kernel.
Avi, how about you go the other way around? I expose you something like:

long eventfd_create(unsigned int count, void (*release)(void *),
void *priv);

This returns you an eventfd (or error), and lets you install a callback to 
be used when the file_operations->release is called (basically, userspace 
closed the last file instance).
Can KVM pass back an fd to userspace instead of the other way around?



- Davide



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-16 Thread Avi Kivity
Davide Libenzi wrote:
> On Tue, 15 May 2007, Christoph Hellwig wrote:
>
>   
>> On Tue, May 15, 2007 at 12:18:17PM -0400, Gregory Haskins wrote:
>> 
>> On Tue, May 15, 2007 at 11:40 AM, in message
>> 
>>> <[EMAIL PROTECTED]>, Davide Libenzi
>>> <[EMAIL PROTECTED]> wrote: 
>>>   
 I don't know how critical is the path where you will be doing check. The 
 eventfd_fget() is pretty fast, so if you're not looking at a performance 
 critical path, I'd suggest that. Otherwise you can do an early 
 eventfd_get, and keep the file*. If you have no the ways to know if the 
 userspace disconnected, an atomic_read(&file- >f_count)==1 will tell you 
 that you're the only owner of the file* (that is, userspace closed the 
 eventfd descriptor). I'd give preference to the former option though.
 
>>> Thanks for the insight, Davide.  It sounds like we can probably stay with 
>>> the way I have it for now, and keep the file->f_count idea in our back 
>>> pocket should a performance problem arise.
>>>   
>> accessing file->f_count is not allowed for drivers.  It took us quite a
>> bit of effort to clean up all users a while ago, and it turned out most
>> of them were rather buggy.
>> 
>
> Right, you could use the file_count() macro :)
> Seriuosly, if this (doing an eventfd_fget+fput) becomes a problem, I can
> have the check done in eventfd_signal() and return a proper error code.
>
>   

IMO doing eventfd_fget() asap is best.  I much prefer refcounted 
pointers to handles in the kernel: it's easier to see what things point 
to, and there is to context needed for dereferencing.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-16 Thread Avi Kivity
Anthony Liguori wrote:
> Gregory Haskins wrote:
>   
>>> While KVM will inevitably start requiring newer kernel versions, do we 
>>> really need to do it right now?  Perhaps we could add dummy eventfd_* 
>>> functions to the module compat header?  Then at least older kernels will 
>>> continue to work with in- kernel APIC disabled.
>>>
>>> 
>>>   
>> The plan as Avi and I agreed to (IIUC) is to support eventfd via the 
>> extern-compat header.  The same could be said for HRT.  I don't thing either 
>> one will be particularly pleasant to support in this fashion, but this is 
>> how he wants to do it.  (Recall that I had abstracted the HRT in earlier 
>> versions which he requested to be removed, so I didnt bother with the 
>> eventfd interface).  I agree that it will keep the kvm core code cleaner 
>> instead of having all these custom abstractions, so I see his point there.  
>> I'm just not looking forward to the compat work ;)
>>
>> That being said: Perhaps you just came up with a good compromise.  The code 
>> is already structured to support both the old way and the new way.  We could 
>> have the in-kernel modes disabled at compile time on older kernels.  Thats 
>> probably a much better solution then trying to get HRT and eventfd emulated 
>> on, say, 2.6.9 ;)
>>
>> The implication here is that I don't plan on supporting new features via the 
>> old-way.  For instance, SMP features would only work with in-kernel 
>> emulation.  If we go down this route, it automatically means that older 
>> kernels will not have any other related advanced features, not just the 
>> performance/features currently offered.  Perhaps this is acceptable also?  
>> I'm not sure.
>>   
>> 
>
> I don't know.  I'm less concerned about long term implications than I am 
> with short term ones.  My primary concern was having new versions of KVM 
> stop working on older kernels.  If SMP requires in-kernel APIC, then 
> provided that UP kernels still work, if someone is sufficiently 
> motivated to get SMP working on older kernels, they can implement the 
> eventfd emulation.
>
>   

I'm not worried about eventfd -- we can just ship a 
conditionally-compiled copy of the code with the external module (we 
also need a non-syscall interface, as modules can't provide syscalls, 
but that's easily workaroundable).  hrtimers is more difficult, but 
there's really no choice -- we can't #ifdef the core code for that.

I really would like to see smp working without kernel apic, as pure qemu 
supports it.  It would of course require newer modules.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-16 Thread Avi Kivity
Davide Libenzi wrote:
> On Tue, 15 May 2007, Gregory Haskins wrote:
>
>   
> On Tue, May 15, 2007 at  3:45 AM, in message <[EMAIL PROTECTED]>,
>   
>> Avi Kivity <[EMAIL PROTECTED]> wrote: 
>> 
>>> Gregory Haskins wrote:
>>>   
 Signed- off- by: Gregory Haskins <[EMAIL PROTECTED]>
 ---

  drivers/kvm/kvm.h  |1 +
  drivers/kvm/kvm_main.c |   52 
 ++--
  include/linux/kvm.h|1 +
  3 files changed, 48 insertions(+), 6 deletions(- )

 diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
 index 7b5d5e6..f5731c4 100644
 ---  a/drivers/kvm/kvm.h
 +++ b/drivers/kvm/kvm.h
 @@ - 333,6 +333,7 @@ struct kvm_vcpu_irq {
int  deferred;
struct task_struct  *task;
int  guest_mode;
 +  int  eventfd;
   
 
>>> Best to convert the fd to a filp when you install it.  This avoids the
>>> conversion during runtime and allows you to do error checking earlier.
>>>   
>> That was my initial impression also, but then I realized there was a 
>> problem with that:  Eventfd doesnt appear to have any way to notify 
>> other entities when the fd is closed.  Therefore the filp could be left 
>> dangling in this case.  By using the fd instead, I can validate the 
>> pointer each time I need it.  Perhaps Davide will have a suggestion 
>> here.
>> 
>
> I don't know how critical is the path where you will be doing check. The 
> eventfd_fget() is pretty fast, so if you're not looking at a performance 
> critical path, I'd suggest that. Otherwise you can do an early 
> eventfd_get, and keep the file*. If you have no the ways to know if the 
> userspace disconnected, an atomic_read(&file->f_count)==1 will tell you 
> that you're the only owner of the file* (that is, userspace closed the 
> eventfd descriptor). I'd give preference to the former option though.
>   

We aren't really interested whether userspace is looking or not.  After 
all, it installed the eventfd, so it must be interested in the events.

> Avi, as you may have read from lkml, Andrew prefers the eventfd symbols 
> export to come through the kvm tree, since they do not want to export 
> symbols that so far has no module users.
>
>   

Yes, I'll pick up the patch from lkml.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-16 Thread Avi Kivity
Gregory Haskins wrote:
 On Tue, May 15, 2007 at  3:45 AM, in message <[EMAIL PROTECTED]>,
 
> Avi Kivity <[EMAIL PROTECTED]> wrote: 
>   
>> Gregory Haskins wrote:
>> 
>>> Signed- off- by: Gregory Haskins <[EMAIL PROTECTED]>
>>> ---
>>>
>>>  drivers/kvm/kvm.h  |1 +
>>>  drivers/kvm/kvm_main.c |   52 
>>> ++--
>>>  include/linux/kvm.h|1 +
>>>  3 files changed, 48 insertions(+), 6 deletions(- )
>>>
>>> diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
>>> index 7b5d5e6..f5731c4 100644
>>> ---  a/drivers/kvm/kvm.h
>>> +++ b/drivers/kvm/kvm.h
>>> @@ - 333,6 +333,7 @@ struct kvm_vcpu_irq {
>>> int  deferred;
>>> struct task_struct  *task;
>>> int  guest_mode;
>>> +   int  eventfd;
>>>   
>>>   
>> Best to convert the fd to a filp when you install it.  This avoids the
>> conversion during runtime and allows you to do error checking earlier.
>> 
>
>
> That was my initial impression also, but then I realized there was a problem 
> with that:  Eventfd doesnt appear to have any way to notify other entities 
> when the fd is closed.  Therefore the filp could be left dangling in this 
> case.  By using the fd instead, I can validate the pointer each time I need 
> it.  Perhaps Davide will have a suggestion here.
>   

eventfd_fget() will bump the reference count.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Anthony Liguori
Gregory Haskins wrote:
>> While KVM will inevitably start requiring newer kernel versions, do we 
>> really need to do it right now?  Perhaps we could add dummy eventfd_* 
>> functions to the module compat header?  Then at least older kernels will 
>> continue to work with in- kernel APIC disabled.
>>
>> 
>
> The plan as Avi and I agreed to (IIUC) is to support eventfd via the 
> extern-compat header.  The same could be said for HRT.  I don't thing either 
> one will be particularly pleasant to support in this fashion, but this is how 
> he wants to do it.  (Recall that I had abstracted the HRT in earlier versions 
> which he requested to be removed, so I didnt bother with the eventfd 
> interface).  I agree that it will keep the kvm core code cleaner instead of 
> having all these custom abstractions, so I see his point there.  I'm just not 
> looking forward to the compat work ;)
>
> That being said: Perhaps you just came up with a good compromise.  The code 
> is already structured to support both the old way and the new way.  We could 
> have the in-kernel modes disabled at compile time on older kernels.  Thats 
> probably a much better solution then trying to get HRT and eventfd emulated 
> on, say, 2.6.9 ;)
>
> The implication here is that I don't plan on supporting new features via the 
> old-way.  For instance, SMP features would only work with in-kernel 
> emulation.  If we go down this route, it automatically means that older 
> kernels will not have any other related advanced features, not just the 
> performance/features currently offered.  Perhaps this is acceptable also?  
> I'm not sure.
>   

I don't know.  I'm less concerned about long term implications than I am 
with short term ones.  My primary concern was having new versions of KVM 
stop working on older kernels.  If SMP requires in-kernel APIC, then 
provided that UP kernels still work, if someone is sufficiently 
motivated to get SMP working on older kernels, they can implement the 
eventfd emulation.

Regards,

Anthony Liguori

> Regards,
> -Greg 
>
>
>
> -
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
>   


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Davide Libenzi
On Tue, 15 May 2007, Christoph Hellwig wrote:

> On Tue, May 15, 2007 at 12:18:17PM -0400, Gregory Haskins wrote:
> > >>> On Tue, May 15, 2007 at 11:40 AM, in message
> > <[EMAIL PROTECTED]>, Davide Libenzi
> > <[EMAIL PROTECTED]> wrote: 
> > >
> > > I don't know how critical is the path where you will be doing check. The 
> > > eventfd_fget() is pretty fast, so if you're not looking at a performance 
> > > critical path, I'd suggest that. Otherwise you can do an early 
> > > eventfd_get, and keep the file*. If you have no the ways to know if the 
> > > userspace disconnected, an atomic_read(&file- >f_count)==1 will tell you 
> > > that you're the only owner of the file* (that is, userspace closed the 
> > > eventfd descriptor). I'd give preference to the former option though.
> > 
> > Thanks for the insight, Davide.  It sounds like we can probably stay with 
> > the way I have it for now, and keep the file->f_count idea in our back 
> > pocket should a performance problem arise.
> 
> accessing file->f_count is not allowed for drivers.  It took us quite a
> bit of effort to clean up all users a while ago, and it turned out most
> of them were rather buggy.

Right, you could use the file_count() macro :)
Seriuosly, if this (doing an eventfd_fget+fput) becomes a problem, I can
have the check done in eventfd_signal() and return a proper error code.



- Davide



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Gregory Haskins
>>> On Tue, May 15, 2007 at 12:39 PM, in message <[EMAIL PROTECTED]>,
Anthony Liguori <[EMAIL PROTECTED]> wrote: 
> This patch series depends on eventfd which means that KVM now requires 
> 2.6.22- rc1.

This is understood.

> 
> While KVM will inevitably start requiring newer kernel versions, do we 
> really need to do it right now?  Perhaps we could add dummy eventfd_* 
> functions to the module compat header?  Then at least older kernels will 
> continue to work with in- kernel APIC disabled.
>

The plan as Avi and I agreed to (IIUC) is to support eventfd via the 
extern-compat header.  The same could be said for HRT.  I don't thing either 
one will be particularly pleasant to support in this fashion, but this is how 
he wants to do it.  (Recall that I had abstracted the HRT in earlier versions 
which he requested to be removed, so I didnt bother with the eventfd 
interface).  I agree that it will keep the kvm core code cleaner instead of 
having all these custom abstractions, so I see his point there.  I'm just not 
looking forward to the compat work ;)

That being said: Perhaps you just came up with a good compromise.  The code is 
already structured to support both the old way and the new way.  We could have 
the in-kernel modes disabled at compile time on older kernels.  Thats probably 
a much better solution then trying to get HRT and eventfd emulated on, say, 
2.6.9 ;)

The implication here is that I don't plan on supporting new features via the 
old-way.  For instance, SMP features would only work with in-kernel emulation.  
If we go down this route, it automatically means that older kernels will not 
have any other related advanced features, not just the performance/features 
currently offered.  Perhaps this is acceptable also?  I'm not sure.

Regards,
-Greg 



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Anthony Liguori
This patch series depends on eventfd which means that KVM now requires 
2.6.22-rc1.

While KVM will inevitably start requiring newer kernel versions, do we 
really need to do it right now?  Perhaps we could add dummy eventfd_* 
functions to the module compat header?  Then at least older kernels will 
continue to work with in-kernel APIC disabled.

Regards,

Anthony Liguori

Gregory Haskins wrote:
> Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
> ---
>
>  drivers/kvm/kvm.h  |1 +
>  drivers/kvm/kvm_main.c |   55 
> +++-
>  include/linux/kvm.h|1 +
>  3 files changed, 51 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> index d5783dc..e1ba066 100644
> --- a/drivers/kvm/kvm.h
> +++ b/drivers/kvm/kvm.h
> @@ -332,6 +332,7 @@ struct kvm_vcpu_irq {
>   int  pending;
>   int  deferred;
>   int  guest_cpu;
> + int  eventfd;
>  };
>  
>  struct kvm_vcpu {
> diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
> index 0c6a62a..7109b66 100644
> --- a/drivers/kvm/kvm_main.c
> +++ b/drivers/kvm/kvm_main.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "x86_emulate.h"
>  #include "segment_descriptor.h"
> @@ -326,6 +327,7 @@ static struct kvm *kvm_create_vm(void)
>   memset(&vcpu->irq, 0, sizeof(vcpu->irq));
>   spin_lock_init(&vcpu->irq.lock);
>   vcpu->irq.deferred = -1;
> + vcpu->irq.eventfd   = -1;
>  
>   vcpu->cpu = -1;
>   vcpu->kvm = kvm;
> @@ -2358,6 +2360,7 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>  {
>   struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
>   int direct_ipi = -1;
> + int eventfd = -1;
>  
>   spin_lock_irq(&vcpu->irq.lock);
>  
> @@ -2379,7 +2382,14 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>*/
>   direct_ipi = vcpu->irq.guest_cpu;
>   BUG_ON(direct_ipi == smp_processor_id());
> - }
> + } else
> + /*
> +  * otherwise, we must assume that we could be
> +  * blocked anywhere, including userspace. Send
> +  * a signal to give everyone a chance to get
> +  * notification
> +  */
> + eventfd = vcpu->irq.eventfd;
>   }
>   }
>  
> @@ -2401,6 +2411,12 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>   smp_call_function_single(direct_ipi,
>kvm_vcpu_guest_intr,
>vcpu, 0, 0);
> +
> + if (eventfd != -1) {
> + struct file *filp = eventfd_fget(eventfd);
> + if (!IS_ERR(filp))
> + eventfd_signal(filp, 1);
> + }
>  }
>  
>  static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
> @@ -2584,6 +2600,17 @@ static int kvm_vcpu_ioctl_set_fpu(struct kvm_vcpu 
> *vcpu, struct kvm_fpu *fpu)
>   return 0;
>  }
>  
> +static int kvm_vcpu_ioctl_set_eventfd(struct kvm_vcpu *vcpu, int fd)
> +{
> + if (IS_ERR(eventfd_fget(fd)))
> + return -EINVAL;
> +
> + vcpu->irq.eventfd = fd;
> + smp_wmb();
> +
> + return 0;
> +}
> +
>  static long kvm_vcpu_ioctl(struct file *filp,
>  unsigned int ioctl, unsigned long arg)
>  {
> @@ -2753,6 +2780,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
>   r = 0;
>   break;
>   }
> + case KVM_SET_EVENTFD: {
> + int eventfd = (long)argp;
> +
> + r = kvm_vcpu_ioctl_set_eventfd(vcpu, eventfd);
> + if (r)
> + goto out;
> + r = 0;
> + break;
> + }
>   default:
>   ;
>   }
> @@ -2937,12 +2973,19 @@ static long kvm_dev_ioctl(struct file *filp,
>   r = 0;
>   break;
>   }
> - case KVM_CHECK_EXTENSION:
> - /*
> -  * No extensions defined at present.
> -  */
> - r = 0;
> + case KVM_CHECK_EXTENSION: {
> + int ext = (long)argp;
> +
> + switch (ext) {
> + case KVM_SET_EVENTFD:
> + r = 1;
> + break;
> + default:
> + r = 0;
> + break;
> + }
>   break;
> + }
>   case KVM_GET_VCPU_MMAP_SIZE:
>   r = -EINVAL;
>   if (arg)
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index e6edca8..f13ec8c 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -300,5 +300,6 @@ struct kvm_signal_mask {
>  #define KVM_SET_SI

Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Gregory Haskins
>>> On Tue, May 15, 2007 at 12:22 PM, in message
<[EMAIL PROTECTED]>, Christoph Hellwig <[EMAIL PROTECTED]>
wrote: 
> On Tue, May 15, 2007 at 12:18:17PM - 0400, Gregory Haskins wrote:
>> >>> On Tue, May 15, 2007 at 11:40 AM, in message
>> <[EMAIL PROTECTED]>, Davide Libenzi
>> <[EMAIL PROTECTED]> wrote: 
>> >
>> > I don't know how critical is the path where you will be doing check. The 
>> > eventfd_fget() is pretty fast, so if you're not looking at a performance 
>> > critical path, I'd suggest that. Otherwise you can do an early 
>> > eventfd_get, and keep the file*. If you have no the ways to know if the 
>> > userspace disconnected, an atomic_read(&file-  >f_count)==1 will tell you 
>> > that you're the only owner of the file* (that is, userspace closed the 
>> > eventfd descriptor). I'd give preference to the former option though.
>> 
>> Thanks for the insight, Davide.  It sounds like we can probably stay with 
> the way I have it for now, and keep the file- >f_count idea in our back 
> pocket 
> should a performance problem arise.
> 
> accessing file- >f_count is not allowed for drivers.  It took us quite a
> bit of effort to clean up all users a while ago, and it turned out most
> of them were rather buggy.

Ack.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Christoph Hellwig
On Tue, May 15, 2007 at 12:18:17PM -0400, Gregory Haskins wrote:
> >>> On Tue, May 15, 2007 at 11:40 AM, in message
> <[EMAIL PROTECTED]>, Davide Libenzi
> <[EMAIL PROTECTED]> wrote: 
> >
> > I don't know how critical is the path where you will be doing check. The 
> > eventfd_fget() is pretty fast, so if you're not looking at a performance 
> > critical path, I'd suggest that. Otherwise you can do an early 
> > eventfd_get, and keep the file*. If you have no the ways to know if the 
> > userspace disconnected, an atomic_read(&file- >f_count)==1 will tell you 
> > that you're the only owner of the file* (that is, userspace closed the 
> > eventfd descriptor). I'd give preference to the former option though.
> 
> Thanks for the insight, Davide.  It sounds like we can probably stay with the 
> way I have it for now, and keep the file->f_count idea in our back pocket 
> should a performance problem arise.

accessing file->f_count is not allowed for drivers.  It took us quite a
bit of effort to clean up all users a while ago, and it turned out most
of them were rather buggy.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Gregory Haskins
>>> On Tue, May 15, 2007 at 11:40 AM, in message
<[EMAIL PROTECTED]>, Davide Libenzi
<[EMAIL PROTECTED]> wrote: 
>
> I don't know how critical is the path where you will be doing check. The 
> eventfd_fget() is pretty fast, so if you're not looking at a performance 
> critical path, I'd suggest that. Otherwise you can do an early 
> eventfd_get, and keep the file*. If you have no the ways to know if the 
> userspace disconnected, an atomic_read(&file- >f_count)==1 will tell you 
> that you're the only owner of the file* (that is, userspace closed the 
> eventfd descriptor). I'd give preference to the former option though.

Thanks for the insight, Davide.  It sounds like we can probably stay with the 
way I have it for now, and keep the file->f_count idea in our back pocket 
should a performance problem arise.

> Avi, as you may have read from lkml, Andrew prefers the eventfd symbols 
> export to come through the kvm tree, since they do not want to export 
> symbols that so far has no module users.

I can create a patch and add it to my series for this work.

Thanks again,
-Greg


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Davide Libenzi
On Tue, 15 May 2007, Gregory Haskins wrote:

> >>> On Tue, May 15, 2007 at  3:45 AM, in message <[EMAIL PROTECTED]>,
> Avi Kivity <[EMAIL PROTECTED]> wrote: 
> > Gregory Haskins wrote:
> >> Signed- off- by: Gregory Haskins <[EMAIL PROTECTED]>
> >> ---
> >>
> >>  drivers/kvm/kvm.h  |1 +
> >>  drivers/kvm/kvm_main.c |   52 
> >> ++--
> >>  include/linux/kvm.h|1 +
> >>  3 files changed, 48 insertions(+), 6 deletions(- )
> >>
> >> diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> >> index 7b5d5e6..f5731c4 100644
> >> ---  a/drivers/kvm/kvm.h
> >> +++ b/drivers/kvm/kvm.h
> >> @@ - 333,6 +333,7 @@ struct kvm_vcpu_irq {
> >>int  deferred;
> >>struct task_struct  *task;
> >>int  guest_mode;
> >> +  int  eventfd;
> >>   
> > 
> > Best to convert the fd to a filp when you install it.  This avoids the
> > conversion during runtime and allows you to do error checking earlier.
> 
> That was my initial impression also, but then I realized there was a 
> problem with that:  Eventfd doesnt appear to have any way to notify 
> other entities when the fd is closed.  Therefore the filp could be left 
> dangling in this case.  By using the fd instead, I can validate the 
> pointer each time I need it.  Perhaps Davide will have a suggestion 
> here.

I don't know how critical is the path where you will be doing check. The 
eventfd_fget() is pretty fast, so if you're not looking at a performance 
critical path, I'd suggest that. Otherwise you can do an early 
eventfd_get, and keep the file*. If you have no the ways to know if the 
userspace disconnected, an atomic_read(&file->f_count)==1 will tell you 
that you're the only owner of the file* (that is, userspace closed the 
eventfd descriptor). I'd give preference to the former option though.
Avi, as you may have read from lkml, Andrew prefers the eventfd symbols 
export to come through the kvm tree, since they do not want to export 
symbols that so far has no module users.


- Davide



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Gregory Haskins
>>> On Tue, May 15, 2007 at  3:45 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>> Signed- off- by: Gregory Haskins <[EMAIL PROTECTED]>
>> ---
>>
>>  drivers/kvm/kvm.h  |1 +
>>  drivers/kvm/kvm_main.c |   52 
>> ++--
>>  include/linux/kvm.h|1 +
>>  3 files changed, 48 insertions(+), 6 deletions(- )
>>
>> diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
>> index 7b5d5e6..f5731c4 100644
>> ---  a/drivers/kvm/kvm.h
>> +++ b/drivers/kvm/kvm.h
>> @@ - 333,6 +333,7 @@ struct kvm_vcpu_irq {
>>  int  deferred;
>>  struct task_struct  *task;
>>  int  guest_mode;
>> +int  eventfd;
>>   
> 
> Best to convert the fd to a filp when you install it.  This avoids the
> conversion during runtime and allows you to do error checking earlier.


That was my initial impression also, but then I realized there was a problem 
with that:  Eventfd doesnt appear to have any way to notify other entities when 
the fd is closed.  Therefore the filp could be left dangling in this case.  By 
using the fd instead, I can validate the pointer each time I need it.  Perhaps 
Davide will have a suggestion here.

Regardless, I do like the idea of checking the fd on assignment.  I will put an 
eventfd_fget() check into the ioctl too just to catch it early.  I might also 
add logic to invalidate irq.eventfd if the eventfd_fget() ever fails at runtime 
because userspace neglects to explicitly unregister (so we don't keep checking 
an invalid fd over and over).

-Greg

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 5/9] KVM: Adds ability to signal userspace using a file-descriptor

2007-05-15 Thread Avi Kivity
Gregory Haskins wrote:
> Signed-off-by: Gregory Haskins <[EMAIL PROTECTED]>
> ---
>
>  drivers/kvm/kvm.h  |1 +
>  drivers/kvm/kvm_main.c |   52 
> ++--
>  include/linux/kvm.h|1 +
>  3 files changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> index 7b5d5e6..f5731c4 100644
> --- a/drivers/kvm/kvm.h
> +++ b/drivers/kvm/kvm.h
> @@ -333,6 +333,7 @@ struct kvm_vcpu_irq {
>   int  deferred;
>   struct task_struct  *task;
>   int  guest_mode;
> + int  eventfd;
>   

Best to convert the fd to a filp when you install it.  This avoids the
conversion during runtime and allows you to do error checking earlier.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel