Lutz Vieweg <[EMAIL PROTECTED]> wrote:
>
> I'm currently investigating the following problem, which seems to indicate
> a misbehaviour of the kernel:
> 
> A server software we implemented is sporadically "hanging" in a select()
> call since we upgraded from kernel 2.4 to (currently) 2.6.9 (we have to wait
> for 2.6.12 before we can upgrade again due to the shared-mem-not-dumped-into-
> core-files problem addressed there).
> 
> What's suspicious is that whenever we attach with gdb to such a hanging 
> process,
> we can see that a pipe, whose file-descriptor is definitely included in the
> fd_set "readfds" (and "n" is also high enough) has a byte in it available for
> reading - and just leaving gdb again is enough to let the server continue just
> fine.
> 
> We are using that pipe, which is known only to the same one process, to cause
> select() to return immediately if a signal (SIGUSR1) had been delivered to the
> process (by another process), there's a signal handler installed that does
> nothing but a (non-blocking) write of 1 byte to the writing end of the pipe.
> 
> This mechanism worked fine before kernel 2.6, and it is still working in 
> 99.99% of
> the cases, but under heavy load, every few hours, we'll see the hanging 
> select()
> as mentioned above.
> 
> I noticed a recent thread at lkml about poll() and pipes, but that seems to 
> address a
> different issue, where there are more events reported than occured, what we
> see is quite the opposite, we want select() to return on that pipe becoming 
> readable...
> 
> Any ideas?
> Any hints on what to do to investigate the problem further?

Could you at least test 2.6.12-rc1?  Otherwise we might be looking for a
bug whicj isn't there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to