On Fri, 19 Jun 2009, Gregory Haskins wrote: > > In the POLLIN event, you schedule_work(&irqfd->inject) and there are no > > races there AFAICS (you basically do not care of anything eventfd memory > > related at all). > > For POLLHUP, you do: > > > > spin_lock(irqfd->slock); > > if (irqfd->wqh) > > schedule_work(&irqfd->inject); > > irqfd->wqh = NULL; > > spin_unlock(irqfd->slock); > > > > In your work function you notice the POLLHUP condition and take proper > > action (dunno what it is in your case). > > In your kvm_irqfd_release() function: > > > > spin_lock(irqfd->slock); > > if (irqfd->wqh) > > remove_wait_queue(irqfd->wqh, &irqfd->wait); > > irqfd->wqh = NULL; > > spin_unlock(irqfd->slock); > > > > Any races in there? > > > > Yes, for one you have an ABBA deadlock on the irqfd->slock and wqh->lock > (recall wqh has to be locked to fix that other race I mentioned).
Yep, true. How about we go in little steps? How about the one below? When you register your poll callback, you get a reference with eventfd_refget(). At that point we have no more worries about the eventfd memory (namely "wqh") going away. Symmetrically, you use eventfd_refput() once you done with it. So we basically de-couple the internal VFS refcount, from the eventfd context memory refcount. Where are the non-solveable races on the IRQfd side, of an approach like this one? > Yes, understood. What I was trying to gently say is that the one-liner > proposal alone is still broken afaict. However, if there is another > solution that works that you like better than 133-liner I posted, I am > more than willing to help analyze it. In the end, I only care that this > is fixed. Is it so? What I noticed here, is that you went from "Detailed Mode" (in the emails where you were pushing the new bits), to "Hint Mode" (as below) when it was time to see if there were simpler solutions to the problem. > (As a hint, I think I fixed 4-5 races with these patches, so there are a > few others still lurking as well) Not knowing the details of IRQfd, hinting only is not going to help anything in this case. - Davide --- fs/eventfd.c | 37 ++++++++++++++++++++++++++++++++++++- include/linux/eventfd.h | 6 ++++++ 2 files changed, 42 insertions(+), 1 deletion(-) Index: linux-2.6.mod/fs/eventfd.c =================================================================== --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-20 13:08:22.000000000 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-20 14:00:23.000000000 -0700 @@ -17,8 +17,10 @@ #include <linux/eventfd.h> #include <linux/syscalls.h> #include <linux/module.h> +#include <linux/kref.h> struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the @@ -59,9 +61,19 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + kref_put(&ctx->kref, eventfd_free); return 0; } @@ -201,6 +213,28 @@ struct file *eventfd_fget(int fd) } EXPORT_SYMBOL_GPL(eventfd_fget); +int eventfd_refget(struct file *file) +{ + struct eventfd_ctx *ctx = file->private_data; + + if (file->f_op != &eventfd_fops) + return -EINVAL; + kref_get(&ctx->kref); + return 0; +} +EXPORT_SYMBOL_GPL(eventfd_refget); + +int eventfd_refput(struct file *file) +{ + struct eventfd_ctx *ctx = file->private_data; + + if (file->f_op != &eventfd_fops) + return -EINVAL; + kref_put(&ctx->kref, eventfd_free); + return 0; +} +EXPORT_SYMBOL_GPL(eventfd_refput); + SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { int fd; @@ -217,6 +251,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, if (!ctx) return -ENOMEM; + kref_init(&ctx->kref); init_waitqueue_head(&ctx->wqh); ctx->count = count; ctx->flags = flags; Index: linux-2.6.mod/include/linux/eventfd.h =================================================================== --- linux-2.6.mod.orig/include/linux/eventfd.h 2009-06-20 13:08:22.000000000 -0700 +++ linux-2.6.mod/include/linux/eventfd.h 2009-06-20 13:59:09.000000000 -0700 @@ -29,12 +29,18 @@ struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, int n); +int eventfd_refget(struct file *file); +int eventfd_refput(struct file *file); #else /* CONFIG_EVENTFD */ #define eventfd_fget(fd) ERR_PTR(-ENOSYS) static inline int eventfd_signal(struct file *file, int n) { return 0; } +static inline int eventfd_refget(struct file *file) +{ return -ENOSYS; } +static inline int eventfd_refput(struct file *file) +{ return -ENOSYS; } #endif /* CONFIG_EVENTFD */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html