Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Paul Johnson
On Thu, Sep 11, 2008 at 02:36:55AM +0100, Andy Armstrong wrote:
 On 11 Sep 2008, at 02:12, Paul Johnson wrote:
 Is my assumption correct that if select tells you there is something to
 be read then there should be something there to be read?  Can anyone
 think of any other possibilities?


 210 + ~30 = ~240 - which is getting close to 255. Since select uses a fixed 
 bit field to represent filenos there's an upper limit on the filenos you 
 can select on. What does this print?

 #include sys/select.h
 #include stdio.h

 int main( void ) {
 printf( %d\n, FD_SETSIZE );
 return 0;
 }

Hi Andy,

Thanks for thinking about this.

The output is 1024, but I'm not convinced the problem lies in this area.

The numbers I gave are a little inaccurate (I was trying to remember off
the top of my head early in the morning).  There were previously 343
named pipes being monitored, and there will now be about 60 more.  These
are split between three processes, so each one will be selecting over
about 135 pipes.

As you see, this should be covered and, in addition, I am using
PERLIO=perlio so that Perl's own IO implementation is being used,
allowing 1024 files to be opened per process, rather than the 256 which
would be allowed with Solaris' stdio implementation.

We have just moved from under 128 files per select to just over, but I
don't think this is the problem.  Additionally, the system will seem to
work fine for many Gb of data before this problem is (seemingly
randonly) triggered.

Thanks again for your hint.  Do you (or anyone else) have any more?  (I
know you'd all rather talk about macs and mobiles.)

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Mark Blackman

On 11 Sep 2008, at 02:12, Paul Johnson wrote:

I'm looking for a little help in solving a problem which has me  
stumped

and couldn't think of anywhere better to come.  That's not the problem
by the way, but I'll take answers to that as well.

I have about 210 named pipes (FIFOs) and three processes which are
running a select over a third of the pipes each, and then calling
sysread on the pipe before writing out the data to log files.

This has been working well in production for almost two years handling
many GB of data daily.

Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will  
indicate

that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.

Reproducing this problem is difficult, but I currently have the system
in such a state.  The pipe on which the sysread call is waiting is one
of the new pipes.

I can only think of four possible explanations here:

 1.  My code is broken.  I don't think this is the case but don't want
 to rule it out.

 2.  Some other process has read the data inbetween the select  
returning

 and the sysread being called.  lsof shows no unexpected processes
 accessing the pipe at the moment and no one should have been  
on the
 system to have run cat or anything.  last shows nothing  
suspicious.


 3. Perl's select is broken.

 4. The OS broken.

Is my assumption correct that if select tells you there is  
something to

be read then there should be something there to be read?  Can anyone
think of any other possibilities?

What is curious to me is that the process writing to the named pipe is
hung.  Is the pipe locked somehow until the sysread call has returned?

Unless I can think of anything better to do, tomorrow I will try to  
send
some data to the named pipe that is being read to see if that will  
allow
the sysread to return.  If it does, I should be able to tell  
whether any

data has been lost from the named pipe, which might indicate that
another process had read it.

I am running perl-5.8.8 on Solaris 8.  The program writing to named  
pipe

is a Java program which is writing to STDOUT.  That program has been
called using system by a Perl wrapper which has reopened STDOUT to the
named pipe.  The program reading from the named pipe is using PERLIO.

I'm open to any hints, suggestions or solutions.



This reminds of a issue I found with select/sysread on solaris too,
although it turned out it was a misunderstanding on my part of the perl
sysread semantics compared to the read system call. It was something
to do with what happened when a pipe was closed unexpectedly I think.
You might review the docs on sysread and select, but I'm sure you've
done that already.

the perl select docs also suggest you use the O_NONBLOCK flag for the
case you're referring to as well.

Sorry, but that's all I can offer without doing any serious research.

- Mark




Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Nigel Hamilton
Hi Paul,



 As you see, this should be covered and, in addition, I am using
 PERLIO=perlio so that Perl's own IO implementation is being used,
 allowing 1024 files to be opened per process, rather than the 256 which
 would be allowed with Solaris' stdio implementation.


Maybe you're suffering from buffering[1] between the two IO implementations.


Have you tried selecting STDOUT and flushing it? Maybe it is blocking on
some left over data? Something like: $| = 1 on the select-ed filehandle will
flush it. One other thing to check is the bytesize and character encoding of
things you are reading off the network - just to make sure there are no left
over bytes in the pipes. Also are you doing slurping reads? Make sure
nothing has messed with the end of line $/ characters.

If you are tracing what's happening sometimes your own trace writes can
bizarrely interact with IO buffers etc. I had to debug a similar problem and
when I put the trace in it blocked and when I removed the trace it worked!?
Try turning tracing off and see if it makes a difference.

A final suggestion would be to not mix IO layers. Good luck - this sounds
like a nasty one. ;-)

Nige

[1] http://perl.plover.com/FAQs/Buffering.html
[2] Network Programming with Perl is a brilliant book for this sort of
thing


Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Mark Overmeer
* Paul Johnson ([EMAIL PROTECTED]) [080911 09:20]:
 On Thu, Sep 11, 2008 at 02:36:55AM +0100, Andy Armstrong wrote:
  printf( %d\n, FD_SETSIZE );
 
 The output is 1024, but I'm not convinced the problem lies in this area.

See http://blogs.sun.com/elving/entry/too_many_open_files
Do you have perl compiled as 32bit or 64bit?  It wouldn't wonder
me if Solaris lied to Perl about the max number of file-descriptors.

 We have just moved from under 128 files per select to just over, but I
 don't think this is the problem.  Additionally, the system will seem to
 work fine for many Gb of data before this problem is (seemingly
 randonly) triggered.

Clients address your service via sockets which count as file-descriptors
as well.  It is not only a limit on what you can pass with select():
the whole sum of pipes and sockets over all threads within one process
can not exceed 256.  (At least, that is my interpretation of the docs)
-- 
Regards,
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Dirk Koopman

Mark Blackman wrote:

On 11 Sep 2008, at 02:12, Paul Johnson wrote:



Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will indicate
that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.





the perl select docs also suggest you use the O_NONBLOCK flag for the
case you're referring to as well.



Select(), on any platform, *may* return an indication that there is data 
to read when there isn't. Therefore using blocking reads with select() 
*will* fail, at some point, in the manner that you describe. The busier 
the system, the more likely it is to occur.


Any tutorial on the use of select() should really mandate the use of 
O_NONBLOCK so that one can capture the EAGAIN/EWOULDBLOCK/EINPROGRESS 
error(s) and then ignore them. If your sysread returns UNDEF, then check 
for these errors in $! and just carry on, otherwise signal EOF in the 
normal way.


Dirk




Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Peter Corlett
On Thu, Sep 11, 2008 at 12:23:39PM +0200, Mark Overmeer wrote:
[...]
 Clients address your service via sockets which count as file-descriptors
 as well. It is not only a limit on what you can pass with select(): the
 whole sum of pipes and sockets over all threads within one process can not
 exceed 256. (At least, that is my interpretation of the docs)

The clients are presumably in a different process, otherwise why would one
bother with IPC in the first place?

The system-wide limit on open file descriptors is possibly worth checking
though.



Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Andy Wardley

Paul Johnson wrote:

Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will indicate
that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.


Could it be a deferred signal?  See perldoc perlipc for more info.
From some code I wrote:

while (1) {
$client = $server-accept() || do {
# accept() can fail in Perl 5.7.3 and later thanks
# to safe signals which can interrupt an accept()
# so we detect this and ignore it
next if $!{EINTR};
last;
};
# handle $client request
}

It's not using select(), but it could be a manifestation of the same
issue.

HTH
A


Re: [OT] select and sysread problem on solaris

2008-09-11 Thread Mark Overmeer
* Dirk Koopman ([EMAIL PROTECTED]) [080911 10:25]:
 Any tutorial on the use of select() should really mandate the use of 
 O_NONBLOCK so that one can capture the EAGAIN/EWOULDBLOCK/EINPROGRESS 
 error(s) and then ignore them. If your sysread returns UNDEF, then check 
 for these errors in $! and just carry on, otherwise signal EOF in the 
 normal way.

IO::Multiplex (by coincedence also discussed for other reasons on
perl5-porters today) is a nice example implementation for this problem.
-- 
   MarkOv


   Mark Overmeer MScMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



[OT] select and sysread problem on solaris

2008-09-10 Thread Paul Johnson
I'm looking for a little help in solving a problem which has me stumped
and couldn't think of anywhere better to come.  That's not the problem
by the way, but I'll take answers to that as well.

I have about 210 named pipes (FIFOs) and three processes which are
running a select over a third of the pipes each, and then calling
sysread on the pipe before writing out the data to log files.

This has been working well in production for almost two years handling
many GB of data daily.

Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will indicate
that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.

Reproducing this problem is difficult, but I currently have the system
in such a state.  The pipe on which the sysread call is waiting is one
of the new pipes.

I can only think of four possible explanations here:

 1.  My code is broken.  I don't think this is the case but don't want
 to rule it out.

 2.  Some other process has read the data inbetween the select returning
 and the sysread being called.  lsof shows no unexpected processes
 accessing the pipe at the moment and no one should have been on the
 system to have run cat or anything.  last shows nothing suspicious.

 3. Perl's select is broken.

 4. The OS broken.

Is my assumption correct that if select tells you there is something to
be read then there should be something there to be read?  Can anyone
think of any other possibilities?

What is curious to me is that the process writing to the named pipe is
hung.  Is the pipe locked somehow until the sysread call has returned?

Unless I can think of anything better to do, tomorrow I will try to send
some data to the named pipe that is being read to see if that will allow
the sysread to return.  If it does, I should be able to tell whether any
data has been lost from the named pipe, which might indicate that
another process had read it.

I am running perl-5.8.8 on Solaris 8.  The program writing to named pipe
is a Java program which is writing to STDOUT.  That program has been
called using system by a Perl wrapper which has reopened STDOUT to the
named pipe.  The program reading from the named pipe is using PERLIO.

I'm open to any hints, suggestions or solutions.

Thanks for reading this far.  Unless you just skipped to the bottom.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


Re: [OT] select and sysread problem on solaris

2008-09-10 Thread Andy Armstrong

On 11 Sep 2008, at 02:12, Paul Johnson wrote:
Is my assumption correct that if select tells you there is something  
to

be read then there should be something there to be read?  Can anyone
think of any other possibilities?



210 + ~30 = ~240 - which is getting close to 255. Since select uses a  
fixed bit field to represent filenos there's an upper limit on the  
filenos you can select on. What does this print?


#include sys/select.h
#include stdio.h

int main( void ) {
printf( %d\n, FD_SETSIZE );
return 0;
}

--
Andy Armstrong, Hexten