> As far as I can tell, this is a problem between qmail-remote and the kernel.
Correct.
> This is happening on multiple operating systems, so that leads me to believe
> that this is not an OS bug.
But many OSes share TCP/IP implementations or mis-interpretations of
the protocol. Many coders of TCP/IP stacks look at other
implementations to work out what to do. There is a *lot* of
commonality between OSes in this regard. Eg, the Linux crowd and the
FreeBSD crowd reguarly refer to each others implementations to decide
how to do something (or not do something as the case may be).
> ** To find out a bit more about what a "stuck" qmail-remote is doing, you
> ** may want to ktrace it and show us the output. Find the process id of the
> ** stuck qmail-remote and then as root go: ktrace -p thepid
> **
> ** Leave that running for at least an hour and show us the output. Yes, I
> ** mean at least an hour.
> **
>
> Ok, I meant to come back in an hour and stop the trace, but after running
> ktrace for 9 hours (while I slept), the resulting ktrace.out file is exactly
> 0 bytes in length. Would you like me to send a copy? <g>
It's a bummer that ktrace is like that on FreeBSD. It doesn't show the
*current* system call that the process is sitting on. Conversely,
truss on Solaris does this nicely...
You can conclude though that qmail-remote wasn't sitting on the
select() as that has a timeout and should show the system calls
associated with the reading loop. If it's not sitting on the select()
what is it sitting on? If it's the read() well, how could that be if
select() said the read would not block?
Regards.