Re: [take19 0/4] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: One can set number of events before the syscall and do not remove them after syscall. It can be updated if there is need for that. Nobody doubts that it is possible. But it is a) potentially much expensive and b) an alien concept to have the signal mask to set during the wait call implicitly. Conceptually it doesn't even make sense. This is no event to wait for. It a parameter for the specific wait call, just like the timeout. And I fortunately haven't seen you proposing to pass the timeout value implicitly. Not good enough? It does exactly what it is supposed to do. What can there be not good enough? Not to move signals into special case of events. If poll() can not work with them it does not mean, that they need to be specified as additional syscall parameter, instead change poll() to work with them, which can be easily done with kevents. You still seem to be completely missing the point. The signal mask is no event to wait for. It has nothing to do with this that ppoll() takes the signal mask as a parameter. The signal mask is a parameter for the wait call just like the timeout, not more and not less. Do not mix warm and soft - waiting for some period is not equal to syscall timeout. Waiting is possible with timer kevent user (although only relative timeout, can be changed to support both, not a big problem). That's what I'm saying all the time. Of course it can be supported. But for this the timeout parameter must be a timespec pointer. Whatever you could possibly mean by do not mix warm and soft I cannot possibly imagine. Fact is that both relative and absolute timeouts are useful. And that for absolute timeouts the change of the clock has to be taken into account. I'm quite sure that absolute timeouts are very usefull, but not as in the case of waiting for syscall completeness. In any way, kevent can be extended to support absolute timeouts in it's timer notifications. That's not the same. If you argue that then the syscall should have no timeout parameter at all. Fact is that setting up a timer is not for free. Since the timeout is used all the time having a timeout parameter is the right answer. And if you do this then do it right just like every other syscall other than poll: use a timespec object. This gives flexibility without measurable cost. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Mon, Oct 16, 2006 at 02:59:48AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: One can set number of events before the syscall and do not remove them after syscall. It can be updated if there is need for that. Nobody doubts that it is possible. But it is a) potentially much expensive and b) an alien concept to have the signal mask to set during the wait call implicitly. Conceptually it doesn't even make sense. This is no event to wait for. It a parameter for the specific wait call, just like the timeout. And I fortunately haven't seen you proposing to pass the timeout value implicitly. Because timeout has it's meaning for syscall processing, but signals are completely separated objects. Why do you want to allow to queue signals _and_ add 'temporal' signal mask for syscall? Just use one way - queue them all. Not good enough? It does exactly what it is supposed to do. What can there be not good enough? Not to move signals into special case of events. If poll() can not work with them it does not mean, that they need to be specified as additional syscall parameter, instead change poll() to work with them, which can be easily done with kevents. You still seem to be completely missing the point. The signal mask is no event to wait for. It has nothing to do with this that ppoll() takes the signal mask as a parameter. The signal mask is a parameter for the wait call just like the timeout, not more and not less. That's where we have different opinioins (among others places :) - I do not agree that signals are parameters for syscall, I insist that is is usual events. ppoll() shows us that there is no difference between signal reported as usual user - syscall returns and we can check if something was changed (signal was delivered or even was fired), it does not differ from the case when syscall returns and we check what event it reports first - ready signal or some other event. Do not mix warm and soft - waiting for some period is not equal to syscall timeout. Waiting is possible with timer kevent user (although only relative timeout, can be changed to support both, not a big problem). That's what I'm saying all the time. Of course it can be supported. But for this the timeout parameter must be a timespec pointer. Whatever you could possibly mean by do not mix warm and soft I cannot possibly imagine. Fact is that both relative and absolute timeouts are useful. And that for absolute timeouts the change of the clock has to be taken into account. They are usefull for special waiting, but not for waiting when syscall is called. The former is supported by timer notifications, the latter - by syscall parameter. We can add support for absolute timer notifications as addon to relative ones. But using there timeval structure is not accessible, since it has different sizes on different arches, so there will be problems with 32/64 arches like x86_64. Instead it is possible to use u32/u32 structure for sec/nsec, like what is used for relative timeouts. I'm quite sure that absolute timeouts are very usefull, but not as in the case of waiting for syscall completeness. In any way, kevent can be extended to support absolute timeouts in it's timer notifications. That's not the same. If you argue that then the syscall should have no timeout parameter at all. Fact is that setting up a timer is not for free. Since the timeout is used all the time having a timeout parameter is the right answer. And if you do this then do it right just like every other syscall other than poll: use a timespec object. This gives flexibility without measurable cost. It does not introduce any flexibility, since syscall does not have a parameter to specify absolute or relative timeout has been provided. That's one. I do argue that syscall must have timout parameter, since it is related to syscall behaviour but not to events syscall is working with - which is completely different things: syscall must be interrupted after some time to allow to fail operation or perform other tasks, but timer event can be fired in any time in the future, syscall should not care about underlaying events. That's two. You say every other syscall other than poll - but even aio_suspend() and friends use relative timeouts (although glibc converts them into absolute to be used with pthread_cond_timedwait), so why do you propose to use wariable sized structure (even if it is transferred almost for free in syscall) instead of usual timeout specified in seconds/nanoseconds/anything? That's three. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: In context you have cut, one updated signal mask between calls to event delivery mechanism (using for example signal()), so it has exactly the same price. No, it does not. If the signal mask is recomputed by the program for each new wait call then you have a lot more work to do when the signal mask is implicitly specified. I created it just because I think that POSIX workaround to add signals into the syscall parameters is not good enough. Not good enough? It does exactly what it is supposed to do. What can there be not good enough? You again cut my explanation on why just pure timeout is used. We start a syscall, which can block forever, so we want to limit it's time, and we add special parameter to show how long this syscall should run. Timeout is not about how long we should sleep (which indeed can be absolute), but how long syscall should run - which is related to the time syscall started. I know very well what a timeout is. But the way the timeout can be specified can vary. It is often useful (as for select, poll) to specify relative timeouts. But there are equally useful uses where the timeout is needed at a specific point in time. Without a syscall interface which can have a absolute timeout parameter we'd have to write as a poor approximation at userlever clock_gettime (CLOCK_REALTIME, ts); struct timespec rel; rel.tv_sec = abstmo.tv_sec - ts.tv_sec; rel.tv_nsec = abstmo.tv_sec - ts.tv_nsec; if (rel.tv_nsec 0) { rel.tv_nsec += 10; --rel.tv_sec; } if (rel.tv_sec 0) inttmo = -1; // or whatever is used for return immediately else inttmo = rel.tv_sec * UINT64_C(10) + rel.tv_nsec; wait(..., inttmo, ...) Not only is this much more expensive to do at userlevel, it is also inadequate because calls to settimeofday() do not cause a recomputation of the timeout. See Ingo's RT futex stuff as an example for a kernel interface which does it right. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Wed, Oct 04, 2006 at 10:20:44AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: It is completely possible to do what you describe without special syscall parameters. First of all, I don't see how this is efficiently possible. The mask might change from call to call. And you can add/remove signal events using existing kevent api between calls. Second, hasn't it sunk in that inventing new ways to pass parameters is bad? Programmers don't want to learn new ways for every new interface. Reuse is good! And creating special cases for usual events is bad. There is unified way to deal with events in kevent - add/remove/modify/wait on them, signals are just usual events. This applies to the signal mask here. But there is another parameter falling into that category and I meant to mention it before: the timeout value. All other calls except poll and especially all modern interfaces use a timespec pointer. This is the way times are kept in userland code. Don't try to force people to do something else. Using a timespec also has the advantage that we can add an absolute timeout value mode (optional) instead of the relative timeout value. In this context, we should/must be able to specify which clock the timeout is for (not as part of the wait call, but another control operation perhaps). It's important to distinguish between CLOCK_REALTIME and CLOCK_MONOTONE. Both have their use. I think you wanted to say, that 'all event mechanism except the most commonly used poll/select/epoll use timespec'. I designed it to be similar to poll(), it is really good interface. Nature of the waiting is to wait for some time, so I put there that 'some time'. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: And you can add/remove signal events using existing kevent api between calls. That's far more expensive than using a mask under control of the program. And creating special cases for usual events is bad. There is unified way to deal with events in kevent - add/remove/modify/wait on them, signals are just usual events. How can this be unified? The installment of the temporary signal mask is unlike the handling of signal for the purpose of reporting them through the signal queue. It's equally completely new functionality. Don't kid yourself in thinking that because this is signal stuff, too, you're unifying something. The way this signal mask is used has nothing whatsoever to do with the delivering signals via the event queue. For the latter the signals always must be blocked (similar to sigwait's requirement). As a result it means you want to introduce a new mechanism for the event queue instead of using the well known and often used method of optionally passing a signal mask to the syscall. That's just insane. I think you wanted to say, that 'all event mechanism except the most commonly used poll/select/epoll use timespec'. Get your facts straight. select uses timeval which is just the predecessor of of timespec. And epoll is just (badly) designed after poll. Fact is therefore that poll plus its spawn is the only interface using such a timeout method. I designed it to be similar to poll(), it is really good interface. Not many people agree. All the interfaces designed (not derived) in the last years take a timespec parameter. Plus, you chose to ignore all the nice things using a timespec allow you like absolute timeout modes etc. See the clock_nanosleep() interface for a way this can be useful. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: [take19 0/4] kevent: Generic event handling mechanism.
On 9/22/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: The only two things missed in patchset after his suggestions are new POSIX-like interface, which I personally consider as very unconvenient, This means you really do not know at all what this is about. We already have these interfaces. Several of them and there will likely be more. These are interfaces for functionality which needs the new event notification. There is *NO* reason whatsoever to not make this - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
[Bah, sent too eaqrly] On 9/22/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: The only two things missed in patchset after his suggestions are new POSIX-like interface, which I personally consider as very unconvenient, This means you really do not know at all what this is about. We already have these interfaces. Several of them and there will likely be more. These are interfaces for functionality which needs the new event notification. There is *NO* reason whatsoever to not make add this extension and instead invent new interfaces to have notification sent to the event queue. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Tue, Oct 03, 2006 at 11:09:15PM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: On 9/22/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: The only two things missed in patchset after his suggestions are new POSIX-like interface, which I personally consider as very unconvenient, This means you really do not know at all what this is about. We already have these interfaces. Several of them and there will likely be more. These are interfaces for functionality which needs the new event notification. There is *NO* reason whatsoever to not make this It looks I'm a bit puzzled... Let me clarify my position about it. Kevent as a generic event handling mechanism should not knwo about how events were added. It was designed to be quite flexible, so one could add events from essentially any possible situation. One of the most common situations is userspace requests - they are added through set of created syscalls. There can exists tons of quadrillions of any other interfaces, I even specially created helper function for kernel subsystems (existing and new ones) which might want to create events using own syscalls and parameters. For example network AIO work that way - it has own syscalls, which parses parameters, creates ukevent structure and pass them into kevent core, which in turn calls appropriate callbacks back to network AIO. Everyone can add new interfaces in any way he likes, it would quite silly to created new subsystem which would required strick API and failed to work with different set of interfaces. So from my point of view, problem is not in case that 'we need only this API', but 'why new API is better that old one'. It is possible to create new API, which will add events from _existing_ syscalls, it is just one function call from given syscall, I completely agree with it. I'm just objecting against removing existing interface in favour of new one. People who need POSIX timer API - feel free to call kevent_user_add_ukevent() from your favorite posix_timer_create(). Who needs signal queueing can do it even in signal() syscall - kevent callback for that subsystem for example can update process' signal mask and add kevents. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Tue, Oct 03, 2006 at 11:10:51PM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: [Bah, sent too eaqrly] On 9/22/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: The only two things missed in patchset after his suggestions are new POSIX-like interface, which I personally consider as very unconvenient, This means you really do not know at all what this is about. We already have these interfaces. Several of them and there will likely be more. These are interfaces for functionality which needs the new event notification. There is *NO* reason whatsoever to not make add this extension and instead invent new interfaces to have notification sent to the event queue. As I described in previous e-mail, there are completely _no_ limitations on iterfaces - it is possible to queue events from any place, not matter if it is new interface (which I prefer to use) or any old one, which is more convenient for someone. There is special herlper function for that. One can check network AIO implementation to see how it was done in practice - network AIO has own syscalls (aio_send(), aio_recv() and aio_sendfile(), which create kevent queue and put there own events, it is completely transparent for userspace which does not even know that network AIO is based on kevent). -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: When we enter sys_ppoll() we specify needed signals as syscall parameter, with kevents we will add them into the queue. No, this is not sufficient as I said in the last mail. Why do you completely ignore what others say. The code which depends on the signal does not have to have access to the event queue. If a library sets up an interrupt handler then it expect the signal to be delivered this way. In such situations ppoll etc allow the signal to be generally blocked and enabled only and *ATOMICALLY* around the delays. This is not possible with the current wait interface. We need this signal mask interfaces and the appropriate setup code. Being able to get signal notifications does not mean this is always the way it can and must happen. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Wed, Oct 04, 2006 at 12:33:25AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: When we enter sys_ppoll() we specify needed signals as syscall parameter, with kevents we will add them into the queue. No, this is not sufficient as I said in the last mail. Why do you completely ignore what others say. The code which depends on the signal does not have to have access to the event queue. If a library sets up an interrupt handler then it expect the signal to be delivered this way. In such situations ppoll etc allow the signal to be generally blocked and enabled only and *ATOMICALLY* around the delays. This is not possible with the current wait interface. We need this signal mask interfaces and the appropriate setup code. Being able to get signal notifications does not mean this is always the way it can and must happen. It is completely possible to do what you describe without special syscall parameters. Just add interesting signals to the queue (and optionally block them globally) and wait on that queue. When signal's event is generated and appropriate kevent is removed, that signal will be restored in global signal mask (there are appropriate enqueue/dequeue callbacks which can perform operations on signal mask for given process). My main consern is to not add special cases for something in generic code, especially when gneric code can easily handle that situations. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: It is completely possible to do what you describe without special syscall parameters. First of all, I don't see how this is efficiently possible. The mask might change from call to call. Second, hasn't it sunk in that inventing new ways to pass parameters is bad? Programmers don't want to learn new ways for every new interface. Reuse is good! This applies to the signal mask here. But there is another parameter falling into that category and I meant to mention it before: the timeout value. All other calls except poll and especially all modern interfaces use a timespec pointer. This is the way times are kept in userland code. Don't try to force people to do something else. Using a timespec also has the advantage that we can add an absolute timeout value mode (optional) instead of the relative timeout value. In this context, we should/must be able to specify which clock the timeout is for (not as part of the wait call, but another control operation perhaps). It's important to distinguish between CLOCK_REALTIME and CLOCK_MONOTONE. Both have their use. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: [take19 0/4] kevent: Generic event handling mechanism.
On 9/27/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: \ I have been told in private what is signal masks about - just to wait until either signal or given condition is ready, but in that case just add additional kevent user like AIO complete or netwrok notification and wait until either requested events are ready or signal is triggered. No, this won't work. Yes, I want signal notification as part of the event handling. But there are situations when this is not suitable. Only if the signal is expected in the same code using the event handling can you do this. But this is not always possible. Especially when the signal handling code is used in other parts of the code than the event handling. E.g., signal handling in a library, event handling in the main code. You cannot assume that all the code is completely integrated. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Tue, Oct 03, 2006 at 09:50:09PM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: On 9/27/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: \ I have been told in private what is signal masks about - just to wait until either signal or given condition is ready, but in that case just add additional kevent user like AIO complete or netwrok notification and wait until either requested events are ready or signal is triggered. No, this won't work. Yes, I want signal notification as part of the event handling. But there are situations when this is not suitable. Only if the signal is expected in the same code using the event handling can you do this. But this is not always possible. Especially when the signal handling code is used in other parts of the code than the event handling. E.g., signal handling in a library, event handling in the main code. You cannot assume that all the code is completely integrated. Signals still can be delivered in usual way too. When we enter sys_ppoll() we specify needed signals as syscall parameter, with kevents we will add them into the queue. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Wed, Sep 20, 2006 at 01:35:47PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: Generic event handling mechanism. Consider for inclusion. I have been told in private what is signal masks about - just to wait until either signal or given condition is ready, but in that case just add additional kevent user like AIO complete or netwrok notification and wait until either requested events are ready or signal is triggered. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Fri, Sep 22, 2006 at 12:22:07PM -0700, Andrew Morton wrote: On Wed, 20 Sep 2006 13:35:47 +0400 Evgeniy Polyakov [EMAIL PROTECTED] wrote: Generic event handling mechanism. Consider for inclusion. Ulrich's objections sounded substantial, and afaik remain largely unresolved. How do we sort this out? I haven't seen any of Ulrichs points (which mostly is a large subset of my objection) beeing addressed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Tue, Sep 26, 2006 at 04:54:16PM +0100, Christoph Hellwig ([EMAIL PROTECTED]) wrote: Generic event handling mechanism. Consider for inclusion. Ulrich's objections sounded substantial, and afaik remain largely unresolved. How do we sort this out? I haven't seen any of Ulrichs points (which mostly is a large subset of my objection) beeing addressed. Could you please be more specific? As far as I can see I addressed all suggestions made by Christoph and still waiting for comments about my points after reply to Ulrich's. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Wed, 20 Sep 2006 13:35:47 +0400 Evgeniy Polyakov [EMAIL PROTECTED] wrote: Generic event handling mechanism. Consider for inclusion. Ulrich's objections sounded substantial, and afaik remain largely unresolved. How do we sort this out? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take19 0/4] kevent: Generic event handling mechanism.
On Fri, Sep 22, 2006 at 12:22:07PM -0700, Andrew Morton ([EMAIL PROTECTED]) wrote: On Wed, 20 Sep 2006 13:35:47 +0400 Evgeniy Polyakov [EMAIL PROTECTED] wrote: Generic event handling mechanism. Consider for inclusion. Ulrich's objections sounded substantial, and afaik remain largely unresolved. How do we sort this out? There are no objections, but request for additional interface. The only two things missed in patchset after his suggestions are new POSIX-like interface, which I personally consider as very unconvenient, but in any way it can be implemented as addon, and signal mask change, but Ulrich have not answered how does it differ from blocking in userspace and then calling appropriate syscall, I expect the difference is only in reduced number of syscalls. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[take19 0/4] kevent: Generic event handling mechanism.
Generic event handling mechanism. Consider for inclusion. Changes from 'take18' patchset: * use __init instead of __devinit * removed 'default N' from config for user statistic * removed kevent_user_fini() since kevent can not be unloaded * use KERN_INFO for statistic output Changes from 'take17' patchset: * Use RB tree instead of hash table. At least for a web sever, frequency of addition/deletion of new kevent is comparable with number of search access, i.e. most of the time events are added, accesed only couple of times and then removed, so it justifies RB tree usage over AVL tree, since the latter does have much slower deletion time (max O(log(N)) compared to 3 ops), although faster search time (1.44*O(log(N)) vs. 2*O(log(N))). So for kevents I use RB tree for now and later, when my AVL tree implementation is ready, it will be possible to compare them. * Changed readiness check for socket notifications. With both above changes it is possible to achieve more than 3380 req/second compared to 2200, sometimes 2500 req/second for epoll() for trivial web-server and httperf client on the same hardware. It is possible that above kevent limit is due to maximum allowed kevents in a time limit, which is 4096 events. Changes from 'take16' patchset: * misc cleanups (__read_mostly, const ...) * created special macro which is used for mmap size (number of pages) calculation * export kevent_socket_notify(), since it is used in network protocols which can be built as modules (IPv6 for example) Changes from 'take15' patchset: * converted kevent_timer to high-resolution timers, this forces timer API update at http://linux-net.osdl.org/index.php/Kevent * use struct ukevent* instead of void * in syscalls (documentation has been updated) * added warning in kevent_add_ukevent() if ring has broken index (for testing) Changes from 'take14' patchset: * added kevent_wait() This syscall waits until either timeout expires or at least one event becomes ready. It also commits that @num events from @start are processed by userspace and thus can be be removed or rearmed (depending on it's flags). It can be used for commit events read by userspace through mmap interface. Example userspace code (evtest.c) can be found on project's homepage. * added socket notifications (send/recv/accept) Changes from 'take13' patchset: * do not get lock aroung user data check in __kevent_search() * fail early if there were no registered callbacks for given type of kevent * trailing whitespace cleanup Changes from 'take12' patchset: * remove non-chardev interface for initialization * use pointer to kevent_mring instead of unsigned longs * use aligned 64bit type in raw user data (can be used by high-res timer if needed) * simplified enqueue/dequeue callbacks and kevent initialization * use nanoseconds for timeout * put number of milliseconds into timer's return data * move some definitions into user-visible header * removed filenames from comments Changes from 'take11' patchset: * include missing headers into patchset * some trivial code cleanups (use goto instead of if/else games and so on) * some whitespace cleanups * check for ready_callback() callback before main loop which should save us some ticks Changes from 'take10' patchset: * removed non-existent prototypes * added helper function for kevent_registered_callbacks * fixed 80 lines comments issues * added shared between userspace and kernelspace header instead of embedd them in one * core restructuring to remove forward declarations * s o m e w h i t e s p a c e c o d y n g s t y l e c l e a n u p * use vm_insert_page() instead of remap_pfn_range() Changes from 'take9' patchset: * fixed -nopage method Changes from 'take8' patchset: * fixed mmap release bug * use module_init() instead of late_initcall() * use better structures for timer notifications Changes from 'take7' patchset: * new mmap interface (not tested, waiting for other changes to be acked) - use nopage() method to dynamically substitue pages - allocate new page for events only when new added kevent requres it - do not use ugly index dereferencing, use structure instead - reduced amount of data in the ring (id and flags), maximum 12 pages on x86 per kevent fd Changes from 'take6' patchset: * a lot of comments! * do not use list poisoning for detection of the fact, that entry is in the list * return number of ready kevents even if copy*user() fails * strict check for number of kevents in syscall * use ARRAY_SIZE for array size calculation * changed superblock magic number * use SLAB_PANIC instead of direct panic() call * changed -E* return values * a lot of small cleanups and indent fixes Changes from 'take5' patchset: * removed compilation warnings about unused wariables when lockdep is not