One of the user space daemons in our product (www.snapix.com) receives Streams messages using getmsg and starts Linux processes (using fork, fork, execv). Under stress (starting tens of processes per second) we have seen that getmsg sometimes returns with -1 (as expected) but with errno set to ECHILD rather than EGAIN.
I am mainly using LiS 2.14.2 but the problem has also been seen with 2.14.4. The problem has been seen on the following versions of Linux: - SuSE 8.0 (2.4.18-64GB-SMP) dual CPU - SuSE 8.0 (2.4.18-4GB) - Redhat 7.3 (2.4.18-3) - Redhat 7.2 (2.4.7-10). The problem is most common on SMP systems. The symptoms are like those that you might get if you had a multi-threaded application that you did not compile with a multi-threaded O/S library so you did not get the multi-threaded version of errno. However my daemon is not multi-threaded and the programs it is starting are not multi-threaded. I have been able to work round the problem by treating ECHILD as equivalent to EAGAIN, however it would be good to get to the bottom of the problem. It may be a Linux rather than LiS issue. I have taken an strace of the failure (attached). Line 128 shows the failing getmsg (SYS_188), however strace thinks it is returning EAGAIN. ____________________________________________________ Richard Hilditch SNAP-IX Group Data Connection Ltd. Tel: +44 208 366 1177 Mail: [EMAIL PROTECTED] Fax: +44 208 367 8501 Web: http://www.dataconnection.com <<strace.txt.gz>>
strace.txt.gz
Description: Binary data
