On 11 Sep 2008, at 02:12, Paul Johnson wrote:

I'm looking for a little help in solving a problem which has me stumped
and couldn't think of anywhere better to come.  That's not the problem
by the way, but I'll take answers to that as well.

I have about 210 named pipes (FIFOs) and three processes which are
running a select over a third of the pipes each, and then calling
sysread on the pipe before writing out the data to log files.

This has been working well in production for almost two years handling
many GB of data daily.

Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will indicate
that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.

Reproducing this problem is difficult, but I currently have the system
in such a state.  The pipe on which the sysread call is waiting is one
of the new pipes.

I can only think of four possible explanations here:

 1.  My code is broken.  I don't think this is the case but don't want
     to rule it out.

2. Some other process has read the data inbetween the select returning
     and the sysread being called.  lsof shows no unexpected processes
accessing the pipe at the moment and no one should have been on the system to have run cat or anything. last shows nothing suspicious.

 3. Perl's select is broken.

 4. The OS broken.

Is my assumption correct that if select tells you there is something to
be read then there should be something there to be read?  Can anyone
think of any other possibilities?

What is curious to me is that the process writing to the named pipe is
hung.  Is the pipe locked somehow until the sysread call has returned?

Unless I can think of anything better to do, tomorrow I will try to send some data to the named pipe that is being read to see if that will allow the sysread to return. If it does, I should be able to tell whether any
data has been lost from the named pipe, which might indicate that
another process had read it.

I am running perl-5.8.8 on Solaris 8. The program writing to named pipe
is a Java program which is writing to STDOUT.  That program has been
called using system by a Perl wrapper which has reopened STDOUT to the
named pipe.  The program reading from the named pipe is using PERLIO.

I'm open to any hints, suggestions or solutions.


This reminds of a issue I found with select/sysread on solaris too,
although it turned out it was a misunderstanding on my part of the perl
sysread semantics compared to the read system call. It was something
to do with what happened when a pipe was closed unexpectedly I think.
You might review the docs on sysread and select, but I'm sure you've
done that already.

the perl select docs also suggest you use the O_NONBLOCK flag for the
case you're referring to as well.

Sorry, but that's all I can offer without doing any serious research.

- Mark


Reply via email to