Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 + /* We re-use eventfd for irqfd */
 + fd = sys_eventfd2(0, 0);
 + if (fd  0) {
 + ret = fd;
 + goto fail;
 + }
 +
 + /* We maintain a reference to eventfd for the irqfd lifetime */
 + file = eventfd_fget(fd);
 + if (IS_ERR(file)) {
 + ret = PTR_ERR(file);
 + goto fail;
 + }
 +
 + irqfd-file = file;

This is just plain wrong.  You have no promise whatsoever that caller of
that sucker won't race with e.g. dup2().  IOW, you can't assume that
file will be of the expected kind.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  


We could do this in two ways:

- move msix entry emulation into the kernel
- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd

I'm leaning towards the latter, though it's not an easy call.


+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);



This will do weird stuff (deliver the irq twice) if the irq is
MSI/MSI-X. I know this was discussed already and is a temporary
shortcut, but maybe add a comment that we really want kvm_toggle_irq,
so that we won't forget?
  


If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  +   /* We re-use eventfd for irqfd */
  +   fd = sys_eventfd2(0, 0);
  +   if (fd  0) {
  +   ret = fd;
  +   goto fail;
  +   }
  +
  +   /* We maintain a reference to eventfd for the irqfd lifetime */
  +   file = eventfd_fget(fd);
  +   if (IS_ERR(file)) {
  +   ret = PTR_ERR(file);
  +   goto fail;
  +   }
  +
  +   irqfd-file = file;
 
 This is just plain wrong.  You have no promise whatsoever that caller of
 that sucker won't race with e.g. dup2().  IOW, you can't assume that
 file will be of the expected kind.

The eventfd_fget() checks for the file_operations pointer, before 
returning the file*, and fails if the fd in not an eventfd. Or you have 
other concerns?



- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote:
 On Sun, 3 May 2009, Al Viro wrote:
  On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   + /* We re-use eventfd for irqfd */
   + fd = sys_eventfd2(0, 0);
   + if (fd  0) {
   + ret = fd;
   + goto fail;
   + }
   +
   + /* We maintain a reference to eventfd for the irqfd lifetime */
   + file = eventfd_fget(fd);
   + if (IS_ERR(file)) {
   + ret = PTR_ERR(file);
   + goto fail;
   + }
   +
   + irqfd-file = file;
  
  This is just plain wrong.  You have no promise whatsoever that caller of
  that sucker won't race with e.g. dup2().  IOW, you can't assume that
  file will be of the expected kind.
 
 The eventfd_fget() checks for the file_operations pointer, before 
 returning the file*, and fails if the fd in not an eventfd. Or you have 
 other concerns?

OK, but... it's still wrong.  Descriptor numbers are purely for interaction
with userland; using them that way violates very general race-prevention
rules, even if you do paper over the worst of consequences with check in
eventfd_fget().

General rules:
* descriptor you've generated is fit only for return to userland;
* descriptor you've got from userland is fit only for *single*
fget() or equivalent, unless you are one of the core syscalls manipulating
the descriptor table itself (dup2, etc.)
* once file is installed in descriptor table, you'd better be past
the last failure exit; sys_close() on cleanup path is not acceptable.
That's what reserving descriptors is for.

IOW, the sane solution would be to export something that returns your
struct file *.  And stop playing with fd completely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   
 This allows an eventfd to be registered as an irq source with a guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will inject
 the registered GSI at the next available window.

 Signed-off-by: Gregory Haskins ghask...@novell.com
 

 If we ever want to use this with e.g. MSI-X emulation in guest, and want
 to be stricly compliant to MSI-X, we'll need a way for guest to mask
 interrupts, and for host to report that a masked interrupt is pending.
 Ideally, all this will be doable with a couple of mmapped pages to avoid
 vmexits/system calls.

   

 We could do this in two ways:

 - move msix entry emulation into the kernel

It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.

 - require the device to support replacing its irqfd, and juggle it like so:
   - guest disables msi
   - replace device model fd with eventfd belonging to us
   - when the device fires its eventfd, set the irq pending bit
   - guest enables msi
- if the pending bit is set, fire the interrupt?
- replace device model fd with the real irqfd

Looks like a lot of code. No?

 I'm leaning towards the latter, though it's not an easy call.

Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.

 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 +   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 +   struct kvm *kvm = irqfd-kvm;
 +
 +   mutex_lock(kvm-lock);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 +   mutex_unlock(kvm-lock);
 

 This will do weird stuff (deliver the irq twice) if the irq is
 MSI/MSI-X. I know this was discussed already and is a temporary
 shortcut, but maybe add a comment that we really want kvm_toggle_irq,
 so that we won't forget?
   

 If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
  

Michael S. Tsirkin wrote:


On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  
  

We could do this in two ways:

- move msix entry emulation into the kernel



It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.
  


Right, but it's more and more core, and more and more bugs.  Bugs in the 
kernel are more expensive to fix and get to users.


  

- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd



Looks like a lot of code. No?
  


We'll need exactly the same code if we do it in the kernel.  The only 
addition is replacing the fd.



I'm leaning towards the latter, though it's not an easy call.



Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.
  


I think this is the best option.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 10:17:16PM +0300, Avi Kivity wrote:
 Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
 which will block/unblock guest from getting interrupt on this irq,
 whatever the source.  Interrupts are queued in kernel while masked. A
 third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
 qemu would call these ioctls when guest edits the MSIX vector control or
 reads the pending bit array.
   

 I think this is the best option.

Sounds good.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 IOW, the sane solution would be to export something that returns your
 struct file *.  And stop playing with fd completely.

This builds but it's not tested at all.

- Make all the work of the old anon_inode_getfd(), done by a new 
  anon_inode_getfile(), with anon_inode_getfd() using its services

- Make all the work done by sys_eventfd(), done by a new 
  eventfd_file_create() (which in turn uses anon_inode_getfile()), with 
  sys_eventfd() using its services

IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and 
fd_install() just before returning.
Is that what you had in mind?



- Davide


---
 fs/anon_inodes.c|   68 +---
 fs/eventfd.c|   44 +---
 include/linux/anon_inodes.h |3 +
 include/linux/eventfd.h |6 +++
 4 files changed, 92 insertions(+), 29 deletions(-)

Index: linux-2.6.mod/fs/anon_inodes.c
===
--- linux-2.6.mod.orig/fs/anon_inodes.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/anon_inodes.c  2009-05-03 12:54:02.0 -0700
@@ -64,28 +64,24 @@ static const struct dentry_operations an
  *
  * Creates a new file by hooking it on a single inode. This is useful for files
  * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfd() will share a single inode,
+ * All the files created with anon_inode_getfile() will share a single inode,
  * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns new descriptor or -error.
+ * setup.  Returns the newly created file* or error.
  */
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
-void *priv, int flags)
+struct file *anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags)
 {
struct qstr this;
struct dentry *dentry;
struct file *file;
-   int error, fd;
+   int error;
 
if (IS_ERR(anon_inode_inode))
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
if (fops-owner  !try_module_get(fops-owner))
-   return -ENOENT;
-
-   error = get_unused_fd_flags(flags);
-   if (error  0)
-   goto err_module;
-   fd = error;
+   return ERR_PTR(-ENOENT);
 
/*
 * Link the inode to a directory entry by creating a unique name
@@ -97,7 +93,7 @@ int anon_inode_getfd(const char *name, c
this.hash = 0;
dentry = d_alloc(anon_inode_mnt-mnt_sb-s_root, this);
if (!dentry)
-   goto err_put_unused_fd;
+   goto err_module;
 
/*
 * We know the anon_inode inode count is always greater than zero,
@@ -123,16 +119,54 @@ int anon_inode_getfd(const char *name, c
file-f_version = 0;
file-private_data = priv;
 
+   return file;
+
+err_dput:
+   dput(dentry);
+err_module:
+   module_put(fops-owner);
+   return ERR_PTR(error);
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfile);
+
+/**
+ * anon_inode_getfd - creates a new file instance by hooking it up to an
+ *anonymous inode, and a dentry that describe the class
+ *of the file
+ *
+ * @name:[in]name of the class of the new file
+ * @fops:[in]file operations for the new file
+ * @priv:[in]private data for the new file (will be file's 
private_data)
+ * @flags:   [in]flags
+ *
+ * Creates a new file by hooking it on a single inode. This is useful for files
+ * that do not need to have a full-fledged inode in order to operate correctly.
+ * All the files created with anon_inode_getfd() will share a single inode,
+ * hence saving memory and avoiding code duplication for the file/inode/dentry
+ * setup.  Returns new descriptor or -error.
+ */
+int anon_inode_getfd(const char *name, const struct file_operations *fops,
+void *priv, int flags)
+{
+   int error, fd;
+   struct file *file;
+
+   error = get_unused_fd_flags(flags);
+   if (error  0)
+   return error;
+   fd = error;
+
+   file = anon_inode_getfile(name, fops, priv, flags);
+   if (IS_ERR(file)) {
+   error = PTR_ERR(file);
+   goto err_put_unused_fd;
+   }
fd_install(fd, file);
 
return fd;
 
-err_dput:
-   dput(dentry);
 err_put_unused_fd:
put_unused_fd(fd);
-err_module:
-   module_put(fops-owner);
return error;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-05-03 

Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-30 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 This allows an eventfd to be registered as an irq source with a guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will inject
 the registered GSI at the next available window.
 
 Signed-off-by: Gregory Haskins ghask...@novell.com

If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 + struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 + struct kvm *kvm = irqfd-kvm;
 +
 + mutex_lock(kvm-lock);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 + mutex_unlock(kvm-lock);

This will do weird stuff (deliver the irq twice) if the irq is
MSI/MSI-X. I know this was discussed already and is a temporary
shortcut, but maybe add a comment that we really want kvm_toggle_irq,
so that we won't forget?

 +}
 +

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html