Re: PROBLEM: Failure to deliver SIGCHLD
Michael Harris wrote: > > [2.] The problem occurs in a forking server similar in function to > inetd. The server employs a very simple SIGCHLD handler that loops on > wait(2), until all zombie processes have been collected. For no > immediately apparent reason, the parent process behaves as if it no > longer receives SIGCHLD. Manually sending the signal has no effect. Sounds like a blocked signal. > [6.] This is the code for the signal handler in the server application. > > void reaper_man (int signum) > { > int stat; > while ( waitpid(-1, &stat, WNOHANG) > 0 ); > } > > signal (SIGCHLD, reaper_man); /* from main() */ > > I dare say it contains no bugs (famous last words) It does - it clobbers errno :-) My suggestions: use sigaction with defined restart/mask/etc behaviour instead of signal. Save and restore errno in the signal handler. Make sure SIGCHLD isn't blocked. But if your only interest is to get rid of the zombies, the most simple solution would be to set SIGCHLD to ignore. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Bad rounding in timeval_to_jiffies [was: Re: Odd Timer behavior in 2.6 vs 2.4 (1 extra tick)]
On Thu, 21 Apr 2005, Chris Friesen wrote: > > Does mainline have a high precision monotonic wallclock that is not > affected by time-of-day changes? Something like "nano/mico seconds > since boot"? On newer kernels with the posix timers (I think 2.6 - not sure though) there's clock_gettime(CLOCK_MONOTONIC, ...). Linus Torvalds wrote: > > Getting "approximate uptime" really really _really_ fast > might be useful for some things, but I don't know how many. I bet most users of gettimeofday actually want a strictly monotonic increasing clock where the actual base time is irrelevant. Just strace some apps - those issuing hundreds and thousands of gettimeofday calls are most likely in this class. Those who only call gettimeofday once or twice or the ones that really want the wall clock time. How often does the kernel use jiffies (the monotonic clock) and how often xtime (the wall clock)? Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: problem with select() - 2.4.5
Thomas Speck wrote: > > tio.c_cflag = baud | CLOCAL; How about adding CREAD? Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: symlink_prefix
Alexander Viro wrote: > > On Thu, 7 Jun 2001, Edgar Toernig wrote: > > > Alexander Viro wrote: > > > ... > > > dir = open("/usr/local", O_DIRECTORY); > > > /* error handling */ > > > new_mount(dir, MNT_SET, fs_fd); /* closes dir and fs_fd */ > > > > Do you really want to start using fds instead of strings for tree > > modifying commands (link, unlink, symlink, rename, mount and umount)? > > Even if it were possible in the new_mount case it wouldn't have the > > atomic lookup+act nature of the old mount. And then, _I_ would > > prefer a uniform interface for tree management commands - strings. > > You have exactly the same atomicity warranties. That is to say, none. > Mountpoint can be renamed between the lookup and mounting. Ok. I thought, mounting is an atomic operation (though normally not required). Hmm... but looking at your last batch of VFS patches sent to lkml you consider mount a more used call in the future ;-) Maybe it would be better to have some more strict rules for mount if ie each login performs a dozen of them... > Moreover, even after mount(2) you can rename() parent of mountpoint. On > all Unices I've seen (well, aside of v7 which didn't have rename(2)). > So if you rely on anything of that kind - you are screwed. Portably > screwed, at that. I thought more about a rename of ie "/usr/local" between the open and the new_mount call. I guess, an unlink("/usr/local") after the open will let the new_mount fail. Btw, what happens in this case of two concurrent mounts? fd1=open("/foo") fd2=open("/foo") new_mount(fd1...) new_mount(fd2...) // or vice versa, first fd2 then fd1 >[...] but even if your argument makes sense, it only makes sense for > "dir" argument. "device" is nothing but a filesystem-specific option. Sure. I only meant the "dir" argument. Maybe I've just an uneasy feeling about such a change because it exposes and depends on internal implementation details of the kernel (the dcache). On other systems it's normally not possible to associate a unique name with a file descriptor. Newer Linux versions may support this for directories due to the dcache (not sure if this is really always the case). And this calling convention for new_mount would be the first one that makes this visible in userspace. And it would depend on this feature. This may limit future changes of the kernel VFS implementation (maybe someone really adds some kind of hardlinked directories or something else that makes it impossible to get a unique name for a dir fd). Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: symlink_prefix
Alexander Viro wrote: > ... > dir = open("/usr/local", O_DIRECTORY); > /* error handling */ > new_mount(dir, MNT_SET, fs_fd); /* closes dir and fs_fd */ Do you really want to start using fds instead of strings for tree modifying commands (link, unlink, symlink, rename, mount and umount)? Even if it were possible in the new_mount case it wouldn't have the atomic lookup+act nature of the old mount. And then, _I_ would prefer a uniform interface for tree management commands - strings. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Daniel Phillips wrote: > > It won't, the open for "." is handled in the VFS, not the filesystem - > it will open the directory. (Without needing to be told it's a > directory via O_DIRECTORY.) If you do open("magicdev") you'll get the > device, because that's handled by magicdevfs. You really mean that "magicdev" is a directory and: open("magicdev/.", O_RDONLY); open("magicdev", O_RDONLY); would both succeed but open different objects? > I'm not claiming there isn't breakage somewhere, you break UNIX fundamentals. But I'm quite relieved now because I'm pretty sure that something like that will never go into the kernel. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Daniel Phillips wrote: > > Oops, oh wait, there's already another open point: your breakage > examples both rely on opening ".". You're right, "." should always be > a directory and I believe that's enforced by the VFS. So we don't have > an example of breakage yet. That's just because I did a simple "ls". But it doesn't make a difference. The magicdevs _are_ directories and chdir("magicdev"); open(".", O_RDONLY); shouldn't open the device. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Daniel Phillips wrote: > On Wednesday 23 May 2001 06:19, Edgar Toernig wrote: > > Daniel Phillips wrote: > > > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote: > > > > On Mon, 21 May 2001, Daniel Phillips wrote: > > > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote: > > > > > > What I'd like to see: > > > > > > > > > > > > - An interface for registering an array of related devices > > > > > > (almost always two: raw and ctl) and their legacy device > > > > > > numbers with a single userspace callout that does whatever > > > > > > /dev/ creation needs to be done. Thus, naming and permissions > > > > > > live in user space. No "device node is also a directory" > > > > > > weirdness... > > > > > > > > > > Could you be specific about what is weird about it? > > > > > > > > *boggle* > > > > > > > >[general sense of unease] > > > > I fully agree with Oliver. It's an abomination. > > We are, or at least, I am, investigating this question purely on > technical grounds - name calling is a noop. Right. But sometimes new ideas raise these kind of feelings ;) > > > It's going to be marked 'd', it's a directory, not a file. > > > > Aha. So you lose the S_ISCHR/BLK attribute. > > Readdir fills in a directory type, so ls sees it as a directory and does > the right thing. On the other hand, we know we're on a device > filesystem so we will next open the name as a regular file, and find > ISCHR or ISBLK: good. ??? The kernel may know it, but the app? Or do you really want to give different stat data on stat(2) and fstat(2)? These flags are currently used by archive/backup prgs. It's a hint that these files are not regular files and shouldn't be opened for reading. Having a 'd' would mean that they would really try to enter the directory and save it's contents. Don't know what happens in this case to your "special" files ;-) > The rule for this filesystem is: if you open with O_DIRECTORY then > directory operations are permitted, nothing else. If you open without > O_DIRECTORY then directory operations are forbidden (as > usual) and normal device semantics apply. As usual? I think you've just changed the rules for O_DIRECTORY. Up to now it's only a flag that tells open it should fail if the name does not refer to a directory. Nothing else. It was introduced to remove a race condition in user space applications. Especially it is optional - everything works the same whether you give the flag or not (except the race avoidance of course). And there are a lot of programs that do not use O_DIRECTORY (it's a Linux private flag, not even mentioned in POSIX). Every program that does: fd = open(foo, O_RDONLY); fchdir(fd); x = opendir(".") will break. And that is POSIX conform. And I know that there are programs that use this when recursively scanning directories (avoids name mangling and repeated name lookups of the directory on later stat calls). > > Directories are not allowed to be read from/written to. The VFS may > > support it, but it's not (current) UNIX. > > Here, we obey this rule: if you open it with O_DIRECTORY then you > can't read from or write to it. IMHO you've just invented opendir(2). > Nothing breaks here, ls works as it always did. > > This is what ls does: > > open("foobar", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 > fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > fcntl64(0x3, 0x2, 0x1, 0x2) = -1 ENOSYS (Function not implemented) > fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > brk(0x805b000) = 0x805b000 > getdents64(0x3, 0x8058270, 0x1000, 0x26) = -1 ENOSYS (Function not implemented) > getdents(3, /* 2 entries */, 2980) = 28 > getdents(3, /* 0 entries */, 2980) = 0 > close(3)= 0 > > Note that ls doesn't do anything as inconvenient as opening > foobar as a normal file first, expecting that operation to fail. Well, your ls does not work "as it always did". Here's an strace of my libc5 system ls: open(".", O_RDONLY) = 3 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents(3, /* 64 entries */, 4096) = 1216 getdents(3, /* 9 entries */, 4096) = 168 getdents(3, /* 0 entries */, 4096) = 0 close(3)= 0 And my find(1) does: open(".", O_RDONLY) = 3 [scan all dirs] fchdir(3)
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Daniel Phillips wrote: > > On Tuesday 22 May 2001 17:24, Oliver Xymoron wrote: > > On Mon, 21 May 2001, Daniel Phillips wrote: > > > On Monday 21 May 2001 19:16, Oliver Xymoron wrote: > > > > What I'd like to see: > > > > > > > > - An interface for registering an array of related devices > > > > (almost always two: raw and ctl) and their legacy device numbers > > > > with a single userspace callout that does whatever /dev/ creation > > > > needs to be done. Thus, naming and permissions live in user > > > > space. No "device node is also a directory" weirdness... > > > > > > Could you be specific about what is weird about it? > > > > *boggle* > > > >[general sense of unease] I fully agree with Oliver. It's an abomination. > > I don't think it's likely to be even workable. Just consider the > > directory entry for a moment - is it going to be marked d or [cb]? > > It's going to be marked 'd', it's a directory, not a file. Aha. So you lose the S_ISCHR/BLK attribute. > > If it doesn't have the directory bit set, Midnight commander won't > > let me look at it, and I wouldn't blame cd or ls for complaining. If it > > does have the 'd' bit set, I wouldn't blame cp, tar, find, or a > > million other programs if they did the wrong thing. They've had 30 > > years to expect that files aren't directories. They're going to act > > weird. > > No problem, it's a directory. Directories are not allowed to be read from/written to. The VFS may support it, but it's not (current) UNIX. > > Linus has been kicking this idea around for a couple years now and > > it's still a cute solution looking for a problem. It just doesn't > > belong in UNIX. > > Hmm, ok, do we still have any *technical* reasons? So with your definition, I have a fs-object that is marked as a directory but opening it opens a device. Pretty nice. How I'm supposed to list it's contents? open+readdir? But the open has nasty side effects. So you have a directory that you are not allowed to list (because of the possible side effects) but is allowed to be read from/written to maybe even issue ioctls to?. And you call that sane??? IMO the whole idea of arguments following the device name is junk (incl a "/ctrl"). Just think about the implications of the original "/dev/ttyS0/19200" suggestion. It sounds nice and tempting. But which programs will benefit. Which gets confused. What will be cleaned up. After some thoughts you'll find out that it's useless ;-) And with special "ctrl" devices (ie /dev/ttyS0 and /dev/ttyS0ctrl): This _may_ work for some kind of devices. But serial ports are one example where it simply will _not_. It requires that you know the name of the device. For ttys this is often not the case. Even if you manage to get some name for stdin for example - now I should simply attach a "ctrl" to that name to get a control channel??? At least dangerous. If I'm lucky I only get an EPERM... Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: F_CTRLFD (was Re: Why side-effects on open(2) are evil.)
Alexander Viro wrote: > > On Sun, 20 May 2001, Edgar Toernig wrote: > > > IMHO any scheme that requires a special name to perform ioctl like > > functions will not work. Often you don't known the name of the > > device you're talking to and then you're lost. > > ls -l /proc/self/fd/ Oh come on. You made most of the VFS and should know better. Since when is it possible to always get a "usable" name for an fd??? The ls -l will give me "deleted", "socket", "...". If I try to access the name given by procfs I may get EPERM, etc etc. And then, it's pretty strange to append a "ctl" to some arbitrary name and I get a control device for that name??? No. Using names is __wrong__! > [not going to happen:] > 1) sys_ioctl() going away from syscall table. I would never suggest that. > 2) semi-automatic conversion of existing applications. Same. Much too dangerous. > To hell with > the way we are finding descriptor, we need to deal with arguments themselves. > And no extra logics in libc will help - the whole problem is that ioctls > have rather irregular arguments. Don Quijote II.? ;-) IMHO any similar powerful (and versatile) interface will see the same problems. Enforcing a read/write like interface (and rejecting drivers that pass ptrs through this interface) may give you some knowledge about the kernel/userspace communication. But the data the flows around will become the same mess that is present with the current ioctl. Every driver invents its own sets of commands, its own rules of argument parsing, ... Maybe it's no longer strange binary data but readable ASCII strings but that's all. Look at how many different "styles" of /proc files there are. > What we need is "make it sane", not "inherit as many things from the > old API as possible". And obvious first target is Linux-specific > device ioctls, simply because they have fewer programs using them. You can impose some rules like "must support" commands, something of how arguments are encoded, errors reported and so on. But I wouldn't like to see an SNMP like mess... IMHO what's needed is a definition for "sane" in this context. Trying to limit the kind of actions performed by ioctls is not "sane". Then people will always revert back to old ioctl. "Sane" could be: network transparent, architecture independant, usable with generic tools and non C-like languages. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: F_CTRLFD (was Re: Why side-effects on open(2) are evil.)
Alexander Viro wrote: > > For the latter, though, > we need to write commands into files and here your miscdevices (or procfs > files, or /dev/foo/ctl - whatever) is needed. IMHO any scheme that requires a special name to perform ioctl like functions will not work. Often you don't known the name of the device you're talking to and then you're lost. So, if you want an additional communication channel to a device why not introduce an fcntl or system call like cltrfd = fcntl(fd, F_CTRLFD)or openctrl(fd) ? That way you can always get access to the control channel and use regular read/write for communication [1]. To make it more versatile, you may want to extent the shell syntax, i.e. a '@' in redirection operators get the control fd: echo "eject" >@/dev/cdrom { echo "b19200,onlcr" >@1 ; echo "Hello World!" ; } >/dev/ttyS0 Yes, requires support in user space apps but doesn't mess around with the file namespace. It's too precious to sacrifice ;-) I don't know how much infrastructure in the kernel is required for this - i.e. add readctrl/writectrl methods or create virtual inodes/devices on the fly? There are more capable people than me to judge on that... Ciao, ET. [1] If you want you can even allow this flag as an open mode to open the ctrl channel without opening the dev. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
nitpicking: a system call without side effects would be pretty useless. Alexander Viro wrote: > A lot of stuff relies on the fact that close(open(foo, O_RDONLY)) is a > no-op. Breaking that assumption is a Bad Thing(tm). That assumption is totally bogus. Even for regular files you have side effects (atime); for anything else they're unpredictable. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wow! Is memory ever cheap!
Larry McVoy wrote: > > Let's review: ECC is nice, but it doesn't solve all data corruption > problems. Applications which do their own end to end data integrity > checks will catch many more error cases than what ECC catches. I think you have a wrong idea why the ECC is there. ECC deals with the inherit shortcommings of DRAM. DRAMs are not perfect. They have a probability to lose a bit. Normally this probability is low enough to live with it. Lets say you have a system with 1MByte and let's say the probability for a single bit error is around 1 error in 100 years. Good enough. Now put 1GByte in the system. You'll get a probability of 10 errors per year. Maybe good enough for a Windows box but not acceptable for your server. So you put in ECC to bring this probability back into reasonable numbers. ECC can correct the single bit errors. You only have to deal with double bit errors. Chance for them is much much lower. Sure, it doesn't solve all data corruption problems - only simple errors in DRAMs. But it makes systems with huge amount of RAM staying up alive much longer. And btw, your integrity checks over data will not protect against a corrupted kernel or application... Ciao, ET. PS: Just let your app run long enough. I'm sure it will detect a checksum error some day ;-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Real Time Traffic Flow Measurement - anybody working on it?
Hi, Michael Clark wrote: > > An obvious kernel improvement for userspace meters like NeTraMet would > be to give libpcap's pcap_read a kernel interface that can return more > than one packet at a time (the libpcap interface has this capability). It's already there - the turbo packet interface (PACKET_RX_RING sockopt). Very nice and fast. Direct transfer to mmapped memory. > An additional feature for network devices that could support it (not > sure if this is feasible) would be to switch to an 'interrupt when > packet buffer full' when in promiscuous mode. With the RX_RING you can poll a memory location in the mmapped memory to detect whether there are new packets. You basically only perform a system call (poll/select) if there's nothing more to do. Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PACKET_MMAP help
Hi, [EMAIL PROTECTED] wrote: > > 1. for tp_frame_size, I dont want to truncate any data on ethernet, I > need 1514 bytes, is this the best way to do it and not waste space? > > static const int TURBO_FRAME_SIZE= > TPACKET_ALIGN(TPACKET_ALIGN(sizeof(tpacket_hdr)) + >TPACKET_ALIGN(sizeof(struct sockaddr_ll)+ETH_HLEN) + 1500); Looks OK. Maybe instead of ETH_HLEN min(ETH_HLEN,16)? The framesize calculation is really strange... > 2. what is tp_block_nr for? I dont understand it, I just set it to 1 > and make tp_block_size big enough for all the frames I need, so its > just one contiguous space, all I need is about a megabyte I think. Better go the other way around - set tb_block_size to PAGE_SIZE and tb_block_nr appropriate. tb_block_size is the contiguous physical memory the kernel tries to allocate. Anything above PAGE_SIZE is likely to fail. For you that would mean only 2 packets per 4k-page. You could try to start with bigger (power of 2) block sizes and go down to smaller ones if it fails (ENOMEM). [1]. Btw, there's in implicit limit on tb_block_nr. The vector to manage the blocks is kmalloc'ed and may not be larger than 128kb giving max 32768 blocks. Hmm... moment... seems there's a similar limit for tp_frame_nr (max 32768 frames). I'm pretty sure _that_ limit was not there when I worked with this during 2.3. Not so nice on gigabit ethernet :-( > 3. is this the general approach for the api? > [...] Looks OK too. >if (tp->status == 0) poll() for pollin on the socket /* is there a >race here? */ No race. > 4. what does the copy threshold setsockopt tuning accomplish? doesnt it always > have to copy anyway, to the mmaped area? I haven't used it myself. Reading the sources it does something different. Afaics when active if there's a packet that has been truncated by the framesize it is additionally stored in the socket's receive queue to be fetched by a normal read/recv. It notifies you about this by setting the TP_STATUS_COPY bit. So it seems to mean: copy to socket if threshold (framesize) exceeded. Ciao, ET. [1] The PACKET_RX_RING sockopt accepts all block sizes that are a multiple of PAGE_SIZE but always allocates a power of 2 size chunk. So using non power of 2 sizes will waste locked kernel memory. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: select() on TCP socket sleeps for 1 tick even if data available
Michael Lindner wrote: >[...] > send(s, ".", 1, 0); >[...] > while (select(r+1, &readfds, 0, 0, 0) > 0) { >[...] >[select returns only after about 1 HZ] Ever heard of nagle? (If not, there's a long thread about it on the mailing list *g*) It's not the select that waits. It's a delay in the tcp send path waiting for more data. Try disabling it: int f=1; setsockopt(s, SOL_TCP, TCP_NODELAY, &f, sizeof(f)); Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
Linus Torvalds wrote: > > The point they disagree is when the event gets removed from the event > queue. For edge triggered, this one is trivial: when a get_events() thing > happens and moves it into user land. This is basically a one-liner, and it > is local to get_events() and needs absolutely no help from anybody else. > So obviously event removal is _very_ simple for edge-triggered events - > the INTACK basically removes the event (and also re-arms the trigger > logic: which is different from most interrupt controllers, so the analogy > falls down here). And IMHO here's a problem. The events are no longer events. They are just hints saying: after the previous get_events() something has happened. You don't know if you've already handled this event. There's no synchron- ization between what the app does and the triggering of 'hints'. For example your waitpid-loop: you get the event, start the waitpid-loop. While processing another process dies. You handle it too (still in the loop). But a new 'hint' has already been registered. So on the next get_event you'll be notified again. I just hope, every event-generator has a WNOHANG flag... It could even be possible, that you are unable to perform some actions without triggering hints despite the fact that the conditions will already be gone before the next get_event. May generate lot of bogus hints. At least the current semantic of for example "POLL_IN on fd was signaled so I may read without blocking" gets lost. Maybe (don't know kernel wise) it makes sense to check in the kernel if the events to be returned to userspace are still valid. The user space has to do it anyway... But that way you get a more level-based design ;) Another thing: while toying with cooperative userspace multithreading I found it much more versatile to have a req_type/req_data tuple in the request structure (ie READ/, TIMEOUT/, WAKEUP/). Ciao, ET. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/