Lutz Vieweg <[EMAIL PROTECTED]> wrote: > > I'm currently investigating the following problem, which seems to indicate > a misbehaviour of the kernel: > > A server software we implemented is sporadically "hanging" in a select() > call since we upgraded from kernel 2.4 to (currently) 2.6.9 (we have to wait > for 2.6.12 before we can upgrade again due to the shared-mem-not-dumped-into- > core-files problem addressed there). > > What's suspicious is that whenever we attach with gdb to such a hanging > process, > we can see that a pipe, whose file-descriptor is definitely included in the > fd_set "readfds" (and "n" is also high enough) has a byte in it available for > reading - and just leaving gdb again is enough to let the server continue just > fine. > > We are using that pipe, which is known only to the same one process, to cause > select() to return immediately if a signal (SIGUSR1) had been delivered to the > process (by another process), there's a signal handler installed that does > nothing but a (non-blocking) write of 1 byte to the writing end of the pipe. > > This mechanism worked fine before kernel 2.6, and it is still working in > 99.99% of > the cases, but under heavy load, every few hours, we'll see the hanging > select() > as mentioned above. > > I noticed a recent thread at lkml about poll() and pipes, but that seems to > address a > different issue, where there are more events reported than occured, what we > see is quite the opposite, we want select() to return on that pipe becoming > readable... > > Any ideas? > Any hints on what to do to investigate the problem further?
Could you at least test 2.6.12-rc1? Otherwise we might be looking for a bug whicj isn't there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/