On Thu, Jan 07, 2010 at 04:26:24PM -0800, Davide Libenzi wrote:
> On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
> 
> > Sure, I was trying to be as brief as possible, here's a detailed summary.
> > 
> > Description of the system (MSI emulation in KVM):
> > 
> > KVM supports an ioctl to assign/deassign an eventfd file to interrupt 
> > message
> > in guest OS.  When this eventfd is signalled, interrupt message is sent.
> > This assignment is done from qemu system emulator.
> > 
> > eventfd is signalled from device emulation in another thread in
> > userspace or from kernel, which talks with guest OS through another
> > eventfd and shared memory (possibility of out of process was discussed
> > but never got implemented yet).
> > 
> > Note: it's okay to delay messages from correctness point of view, but
> > generally this is latency-sensitive path. If multiple identical messages
> > are requested, it's okay to send a single last message, but missing a
> > message altogether causes deadlocks.  Sending a message when none were
> > requested might in theory cause crashes, in practice doing this causes
> > performance degradation.
> > 
> > Another KVM feature is interrupt masking: guest OS requests that we
> > stop sending some interrupt message, possibly modified mapping
> > and re-enables this message. This needs to be done without
> > involving the device that might keep requesting events:
> > while masked, message is marked "pending", and guest might test
> > the pending status.
> > 
> > We can implement masking in system emulator in userspace, by using
> > assign/deassign ioctls: when message is masked, we simply deassign all
> > eventfd, and when it is unmasked, we assign them back.
> > 
> > Here's some code to illustrate how this all works: assign/deassign code
> > in kernel looks like the following:
> > 
> > 
> > this is called to unmask interrupt
> > 
> > static int
> > kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> > {
> >     struct _irqfd *irqfd, *tmp;
> >     struct file *file = NULL;
> >     struct eventfd_ctx *eventfd = NULL;
> >     int ret;
> >     unsigned int events;
> > 
> >     irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > 
> > ...
> > 
> >     file = eventfd_fget(fd);
> >     if (IS_ERR(file)) {
> >             ret = PTR_ERR(file);
> >             goto fail;
> >     }
> > 
> >     eventfd = eventfd_ctx_fileget(file);
> >     if (IS_ERR(eventfd)) {
> >             ret = PTR_ERR(eventfd);
> >             goto fail;
> >     }
> > 
> >     irqfd->eventfd = eventfd;
> > 
> >     /*
> >      * Install our own custom wake-up handling so we are notified via
> >      * a callback whenever someone signals the underlying eventfd
> >      */
> >     init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> >     init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> > 
> >     spin_lock_irq(&kvm->irqfds.lock);
> > 
> >     events = file->f_op->poll(file, &irqfd->pt);
> > 
> >     list_add_tail(&irqfd->list, &kvm->irqfds.items);
> >     spin_unlock_irq(&kvm->irqfds.lock);
> > 
> > A.
> >     /*
> >      * Check if there was an event already pending on the eventfd
> >      * before we registered, and trigger it as if we didn't miss it.
> >      */
> >     if (events & POLLIN)
> >             schedule_work(&irqfd->inject);
> > 
> >     /*
> >      * do not drop the file until the irqfd is fully initialized, otherwise
> >      * we might race against the POLLHUP
> >      */
> >     fput(file);
> > 
> >     return 0;
> > 
> > fail:
> >     ...
> > }
> 
> What is you do (under proper irqfd locking) something like:
> 
>       eventfd_ctx_read(ctx, 1, &cnt);
>       if (irqfd->cnt != cnt) {
>               irqfd->cnt = cnt;
>               schedule_work(&irqfd->inject);
>       }
> 
> 
> 
> 
> > And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> > so this is not under the spinlock):
> > 
> >         /*
> >          * Synchronize with the wait-queue and unhook ourselves to
> >          * prevent
> >          * further events.
> >          */
> > B.
> >         remove_wait_queue(irqfd->wqh, &irqfd->wait);
> > 
> >     ....
> > 
> >         /*
> >          * It is now safe to release the object's resources
> >          */
> >         eventfd_ctx_put(irqfd->eventfd);
> >         kfree(irqfd);
> 
> And:
> 
>       eventfd_ctx_read(ctx, 1, &irqfd->cnt);


->

>       remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
> 
> 
> 
> - Davide

Yes, this is exactly what I wanted to do.  So, here's the issue: if an
event is signalled at point ->: after eventfd_ctx_read but before
remove_wait_queue, then we inject interrupt but counter will be left
non-zero and then when we unmask, we inject antoher, spurious interrupt.

This is why I wanted to have eventfd_ctx_read not take wait queue head
lock: then I could do:

        spin_lock_irqsave(&ctx->wqh.lock, flags);
        eventfd_ctx_read(ctx, 1, &irqfd->cnt);
        __remove_wait_queue(irqfd->wqh, &irqfd->wait);
        spin_lock_irqrestore(&ctx->wqh.lock, flags);


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to