Andrew Morton wrote:
Lutz Vieweg <[EMAIL PROTECTED]> wrote:

I'm currently investigating the following problem, which seems to indicate
a misbehaviour of the kernel:

A server software we implemented is sporadically "hanging" in a select()
call since we upgraded from kernel 2.4 to (currently) 2.6.9 (we have to wait
for 2.6.12 before we can upgrade again due to the shared-mem-not-dumped-into-
core-files problem addressed there).
...
Any ideas?
Any hints on what to do to investigate the problem further?


Could you at least test 2.6.12-rc1?  Otherwise we might be looking for a
bug whicj isn't there.

We'll do that, but it will take some time, as the server requirements are such that we cannot easily setup yet another instance, we don't have that many 32GB-RAM 4-way-opterons :-)


Jim Nance wrote:
We are using that pipe, which is known only to the same one process, to
cause select() to return immediately if a signal (SIGUSR1) had been
delivered to the process (by another process), there's a signal handler
installed that does nothing but a (non-blocking) write of 1 byte to the
writing end of the pipe.


I'm not sure if this is what is causing your problem, but shouldnt you
be doing a blocking write?  It may be that the pipe is not writeable
at the moment the signal arives.  I think that could cause the symptoms
you describe.

If the pipe wasn't writeable at the time when the signal handler tried to write a byte, that would mean there were already N (probably 4096) bytes in the pipe, causing the select() to fall through, anyway. The semantic of the pipe is not to count signal deliveries, but only to contain "something" if there had been a reason to fall through the select().

Regards,

Lutz Vieweg


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Reply via email to