One of the user space daemons in our product (www.snapix.com) receives
Streams messages using getmsg and starts Linux processes (using fork, fork,
execv). Under stress (starting tens of processes per second) we have seen
that getmsg sometimes returns with -1 (as expected) but with errno set to
ECHILD rather than EGAIN. 

I am mainly using LiS 2.14.2 but the problem has also been seen with 2.14.4.
The problem has been seen on the following versions of Linux:
- SuSE 8.0 (2.4.18-64GB-SMP) dual CPU
- SuSE 8.0 (2.4.18-4GB)
- Redhat 7.3 (2.4.18-3)
- Redhat 7.2 (2.4.7-10).
The problem is most common on SMP systems.

The symptoms are like those that you might get if you had a multi-threaded
application that you did not compile with a multi-threaded O/S library so
you did not get the multi-threaded version of errno. However my daemon is
not multi-threaded and the programs it is starting are not multi-threaded.

I have been able to work round the problem by treating ECHILD as equivalent
to EAGAIN, however it would be good to get to the bottom of the problem. It
may be a Linux rather than LiS issue.

I have taken an strace of the failure (attached). Line 128 shows the failing
getmsg (SYS_188), however strace thinks it is returning EAGAIN.

____________________________________________________
Richard Hilditch
SNAP-IX Group
Data Connection Ltd.
Tel:    +44  208 366 1177       Mail:   [EMAIL PROTECTED]
Fax:    +44  208 367 8501       Web:    http://www.dataconnection.com

 <<strace.txt.gz>> 

Attachment: strace.txt.gz
Description: Binary data

Reply via email to