Linus Torvalds wrote:
> Here's a suggested "good" interface that would certainly be easy to
> implement, and very easy to use, with none of the scalability issues that
> many interfaces have.
I think everyone should take a timeout and look at Solaris 8's /dev/poll
interface. This discussion is reinventing the wheel, the lever, and the
inclined plane.
http://docs.sun.com/ab2/coll.40.6/REFMAN7/@Ab2PageView/55123
I think this is a lot cleaner than your approach:
* it doesn't add extra syscalls
* it handles multiple event queues, and does so without ugliness.
all the per-queue state is held in the /dev/poll's "struct file"
instance
* in your method you have a per-process queue - but under what clone()
conditions is it shared among siblings? Here the user has a choice -
they can share the open("/dev/poll") file descriptor or not using
any of the existing means. Granted, they also would probably want
to arrange to share the fd's being polled in order to make this
useful.
* No new fields in task_struct
A few simple improvements can be made to the Sun model, though:
* The fact that the fd of /dev/poll can't be used for poll(2) or select(2)
is ugly. Sure, you probably don't want an open instance of /dev/poll
being added to another /dev/poll, but being able to call "select" on
them would be really useful:
1. Imagine a server that has to process connections from both
high-priority and low-priority clients - and that requests from
the high-priority ones always take precedence. With this
interface you could easily have two open instances of /dev/poll
and then call select(2) on them. This ability just falls
out naturally from the interface.
2. Some libraries are internally driven by select(2) loops (I think
Xlib is like this, IIRC) If you have a lot of fd's you want to
watch, this means you must take the hit of calling select(2) on
all of them. If you could just pass in a fd for /dev/poll,
problem solved.
* I think the fact that you add events via write(2) but retrieve them
via ioctl(2) is an ugly asymmetry. From what I can see, the entire
motivation for using ioctl as opposed to read(2) is to allow the user
to specify a timeout. If you could use poll(2)/select(2) on /dev/poll
this issue would be moot (see above)
* It would be nice if the interface were flexible enough to report
items _other_ than "activity on fd" in the future. Maybe SYSV IPC?
itimers? directory update notification? It seems that every time
UNIX adds a mechanism of waiting for events, we spoil it by not
making it flexible enough to wait on everything you might want.
Lets not repeat the mistake with a new interface.
* The "struct pollfd" and "struct dvpoll" should also include a 64-bit
opaque quantity supplied by userland which is returned with each event
on that fd. This would save the program from having to look up
which connection context goes with each fd - the kernel could just
give you the pointer to the structure. Not having this capability isn't
a burden for programs dealing with a small number of fd's (since they
can just have a lookup table) but if you potentially have 10000's of
connections it may be undesirable to allocate an array for them all.
The linux patch of /dev/poll implements mmap'ing of the in-kenrel poll
table... I don't think this is a good idea. First, the user just wants to
be able to add events and dequeue them - both linear operations. Second,
why should the kernel be forced to maintain a fixed-size linear list of
events we're looking for... this seems mindlessly inefficient. Why not
just pull a "struct pollfd" out of slab each time a new event is listened
for?
My unresolved concerns:
* Is this substantially better than the already existing rtsig+poll
solution? Enough to make implementing it worth while?
* How do we quickly do the "struct file" -> "struct pollfd" conversion
each time an event happens? Especially if there are multiple /dev/poll
instances open in the current process. Probably each "struct file"
would need a pointer to the instance of /dev/poll which would have
some b-tree variant (or maybe a hash table). The limitation would
be that a single fd couldn't be polled for events by two different
/dev/poll instances, even for different events. This is probably
good for sanity's sake anyway.
-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/