From: Zach Brown <[EMAIL PROTECTED]> Date: Thu, 27 Jul 2006 12:18:42 -0700
[ I kept this thread around in my inbox because I wanted to give it some deep thought, so sorry for replying to old bits... ] > So as the kernel generates events in the ring it only produces an event > if the ownership field says that userspace has consumed it and in doing > so it sets the ownership field to tell userspace that an event is > waiting. userspace and the kernel now each follow their index around > the ring as the ownership field lets them produce or consume the event > at their index. Can someone tell me if the cache coherence costs of > this are extreme? I'm hoping they're not. No need for an owner field, we can use something like a VJ netchannel datastructure for this. Kernel only writes to producer index and user only writes to consumer index. > So, great, glibc can now find pending events very quickly if they're > waiting in the ring and can fall back to the collection syscall if it > wants to wait and the ring is empty. If it consumes events via the > syscall it increases its ring index by the number the syscall returned. I do not think if we do a ring buffer that events should be obtainable via a syscall at all. Rather, I think this system call should be purely "sleep until ring is not empty". This is actually reasonably simple stuff to implement as Evgeniy has tried to explain. Events in kevent live on a ready list when they have triggered. Existence on a list determined the state, and I think this design btw invalidates some of the arguments against using netlink that Ulrich mentions in his paper. If netlink socket queuing fails, well then kevent stays on ready list and that is all until the kevent can be successfully published to the user. I am not advocating netlink at all for this, as the ring buffer idea is much better. The ring buffer size, as Evgeniy also tried to describe, is bounded purely by the number of registered events. So event loop of application might look something like this: struct ukevent cur_event; struct timeval timeo; setup_timeout(&timeo); for (;;) { int err; while(!(err = ukevent_dequeue(evt_fd, evt_ring, &cur_event, &timeo))) { struct my_event_object *o = event_to_object(&cur_event); o->dispatch(o, &cur_event); setup_timeout(&timeo); } if (err == -ETIMEDOUT) timeout_processing(); else event_error_processing(err); } ukevent_dequeue() is perhaps some GLIBC implemented routine which does something like: int err; for (;;) { if (!evt_ring_empty(evt_ring)) { struct ukevent *p = evt_ring_consume(evt_ring); memcpy(event_p, p, sizeof(struct ukevent)); return 0; } err = kevent_wait(evt_fd, timeo_p); if (err < 0) break; } return err; It's just some stupid ideas... we could also choose to expose the ring buffer layout directly to the user event loop and let it perform the dequeue operation and kevent_wait() calls directly. I don't see why not to allow that. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html