Re: [OT] select and sysread problem on solaris
On Thu, Sep 11, 2008 at 02:36:55AM +0100, Andy Armstrong wrote: On 11 Sep 2008, at 02:12, Paul Johnson wrote: Is my assumption correct that if select tells you there is something to be read then there should be something there to be read? Can anyone think of any other possibilities? 210 + ~30 = ~240 - which is getting close to 255. Since select uses a fixed bit field to represent filenos there's an upper limit on the filenos you can select on. What does this print? #include sys/select.h #include stdio.h int main( void ) { printf( %d\n, FD_SETSIZE ); return 0; } Hi Andy, Thanks for thinking about this. The output is 1024, but I'm not convinced the problem lies in this area. The numbers I gave are a little inaccurate (I was trying to remember off the top of my head early in the morning). There were previously 343 named pipes being monitored, and there will now be about 60 more. These are split between three processes, so each one will be selecting over about 135 pipes. As you see, this should be covered and, in addition, I am using PERLIO=perlio so that Perl's own IO implementation is being used, allowing 1024 files to be opened per process, rather than the 256 which would be allowed with Solaris' stdio implementation. We have just moved from under 128 files per select to just over, but I don't think this is the problem. Additionally, the system will seem to work fine for many Gb of data before this problem is (seemingly randonly) triggered. Thanks again for your hint. Do you (or anyone else) have any more? (I know you'd all rather talk about macs and mobiles.) -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: [OT] select and sysread problem on solaris
On 11 Sep 2008, at 02:12, Paul Johnson wrote: I'm looking for a little help in solving a problem which has me stumped and couldn't think of anywhere better to come. That's not the problem by the way, but I'll take answers to that as well. I have about 210 named pipes (FIFOs) and three processes which are running a select over a third of the pipes each, and then calling sysread on the pipe before writing out the data to log files. This has been working well in production for almost two years handling many GB of data daily. Recently, another thirty or so pipes have been added to this group and very occassionally I am noticing a problem whereby select will indicate that a pipe is ready for reading and sysread will attempt to read from the pipe, but there is actually nothing there to be read, and so the sysread call hangs waiting for input. Reproducing this problem is difficult, but I currently have the system in such a state. The pipe on which the sysread call is waiting is one of the new pipes. I can only think of four possible explanations here: 1. My code is broken. I don't think this is the case but don't want to rule it out. 2. Some other process has read the data inbetween the select returning and the sysread being called. lsof shows no unexpected processes accessing the pipe at the moment and no one should have been on the system to have run cat or anything. last shows nothing suspicious. 3. Perl's select is broken. 4. The OS broken. Is my assumption correct that if select tells you there is something to be read then there should be something there to be read? Can anyone think of any other possibilities? What is curious to me is that the process writing to the named pipe is hung. Is the pipe locked somehow until the sysread call has returned? Unless I can think of anything better to do, tomorrow I will try to send some data to the named pipe that is being read to see if that will allow the sysread to return. If it does, I should be able to tell whether any data has been lost from the named pipe, which might indicate that another process had read it. I am running perl-5.8.8 on Solaris 8. The program writing to named pipe is a Java program which is writing to STDOUT. That program has been called using system by a Perl wrapper which has reopened STDOUT to the named pipe. The program reading from the named pipe is using PERLIO. I'm open to any hints, suggestions or solutions. This reminds of a issue I found with select/sysread on solaris too, although it turned out it was a misunderstanding on my part of the perl sysread semantics compared to the read system call. It was something to do with what happened when a pipe was closed unexpectedly I think. You might review the docs on sysread and select, but I'm sure you've done that already. the perl select docs also suggest you use the O_NONBLOCK flag for the case you're referring to as well. Sorry, but that's all I can offer without doing any serious research. - Mark
Re: [OT] select and sysread problem on solaris
Hi Paul, As you see, this should be covered and, in addition, I am using PERLIO=perlio so that Perl's own IO implementation is being used, allowing 1024 files to be opened per process, rather than the 256 which would be allowed with Solaris' stdio implementation. Maybe you're suffering from buffering[1] between the two IO implementations. Have you tried selecting STDOUT and flushing it? Maybe it is blocking on some left over data? Something like: $| = 1 on the select-ed filehandle will flush it. One other thing to check is the bytesize and character encoding of things you are reading off the network - just to make sure there are no left over bytes in the pipes. Also are you doing slurping reads? Make sure nothing has messed with the end of line $/ characters. If you are tracing what's happening sometimes your own trace writes can bizarrely interact with IO buffers etc. I had to debug a similar problem and when I put the trace in it blocked and when I removed the trace it worked!? Try turning tracing off and see if it makes a difference. A final suggestion would be to not mix IO layers. Good luck - this sounds like a nasty one. ;-) Nige [1] http://perl.plover.com/FAQs/Buffering.html [2] Network Programming with Perl is a brilliant book for this sort of thing
Re: [OT] select and sysread problem on solaris
* Paul Johnson ([EMAIL PROTECTED]) [080911 09:20]: On Thu, Sep 11, 2008 at 02:36:55AM +0100, Andy Armstrong wrote: printf( %d\n, FD_SETSIZE ); The output is 1024, but I'm not convinced the problem lies in this area. See http://blogs.sun.com/elving/entry/too_many_open_files Do you have perl compiled as 32bit or 64bit? It wouldn't wonder me if Solaris lied to Perl about the max number of file-descriptors. We have just moved from under 128 files per select to just over, but I don't think this is the problem. Additionally, the system will seem to work fine for many Gb of data before this problem is (seemingly randonly) triggered. Clients address your service via sockets which count as file-descriptors as well. It is not only a limit on what you can pass with select(): the whole sum of pipes and sockets over all threads within one process can not exceed 256. (At least, that is my interpretation of the docs) -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: [OT] select and sysread problem on solaris
Mark Blackman wrote: On 11 Sep 2008, at 02:12, Paul Johnson wrote: Recently, another thirty or so pipes have been added to this group and very occassionally I am noticing a problem whereby select will indicate that a pipe is ready for reading and sysread will attempt to read from the pipe, but there is actually nothing there to be read, and so the sysread call hangs waiting for input. the perl select docs also suggest you use the O_NONBLOCK flag for the case you're referring to as well. Select(), on any platform, *may* return an indication that there is data to read when there isn't. Therefore using blocking reads with select() *will* fail, at some point, in the manner that you describe. The busier the system, the more likely it is to occur. Any tutorial on the use of select() should really mandate the use of O_NONBLOCK so that one can capture the EAGAIN/EWOULDBLOCK/EINPROGRESS error(s) and then ignore them. If your sysread returns UNDEF, then check for these errors in $! and just carry on, otherwise signal EOF in the normal way. Dirk
Re: [OT] select and sysread problem on solaris
On Thu, Sep 11, 2008 at 12:23:39PM +0200, Mark Overmeer wrote: [...] Clients address your service via sockets which count as file-descriptors as well. It is not only a limit on what you can pass with select(): the whole sum of pipes and sockets over all threads within one process can not exceed 256. (At least, that is my interpretation of the docs) The clients are presumably in a different process, otherwise why would one bother with IPC in the first place? The system-wide limit on open file descriptors is possibly worth checking though.
Re: [OT] select and sysread problem on solaris
Paul Johnson wrote: Recently, another thirty or so pipes have been added to this group and very occassionally I am noticing a problem whereby select will indicate that a pipe is ready for reading and sysread will attempt to read from the pipe, but there is actually nothing there to be read, and so the sysread call hangs waiting for input. Could it be a deferred signal? See perldoc perlipc for more info. From some code I wrote: while (1) { $client = $server-accept() || do { # accept() can fail in Perl 5.7.3 and later thanks # to safe signals which can interrupt an accept() # so we detect this and ignore it next if $!{EINTR}; last; }; # handle $client request } It's not using select(), but it could be a manifestation of the same issue. HTH A
Re: [OT] select and sysread problem on solaris
* Dirk Koopman ([EMAIL PROTECTED]) [080911 10:25]: Any tutorial on the use of select() should really mandate the use of O_NONBLOCK so that one can capture the EAGAIN/EWOULDBLOCK/EINPROGRESS error(s) and then ignore them. If your sysread returns UNDEF, then check for these errors in $! and just carry on, otherwise signal EOF in the normal way. IO::Multiplex (by coincedence also discussed for other reasons on perl5-porters today) is a nice example implementation for this problem. -- MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
[OT] select and sysread problem on solaris
I'm looking for a little help in solving a problem which has me stumped and couldn't think of anywhere better to come. That's not the problem by the way, but I'll take answers to that as well. I have about 210 named pipes (FIFOs) and three processes which are running a select over a third of the pipes each, and then calling sysread on the pipe before writing out the data to log files. This has been working well in production for almost two years handling many GB of data daily. Recently, another thirty or so pipes have been added to this group and very occassionally I am noticing a problem whereby select will indicate that a pipe is ready for reading and sysread will attempt to read from the pipe, but there is actually nothing there to be read, and so the sysread call hangs waiting for input. Reproducing this problem is difficult, but I currently have the system in such a state. The pipe on which the sysread call is waiting is one of the new pipes. I can only think of four possible explanations here: 1. My code is broken. I don't think this is the case but don't want to rule it out. 2. Some other process has read the data inbetween the select returning and the sysread being called. lsof shows no unexpected processes accessing the pipe at the moment and no one should have been on the system to have run cat or anything. last shows nothing suspicious. 3. Perl's select is broken. 4. The OS broken. Is my assumption correct that if select tells you there is something to be read then there should be something there to be read? Can anyone think of any other possibilities? What is curious to me is that the process writing to the named pipe is hung. Is the pipe locked somehow until the sysread call has returned? Unless I can think of anything better to do, tomorrow I will try to send some data to the named pipe that is being read to see if that will allow the sysread to return. If it does, I should be able to tell whether any data has been lost from the named pipe, which might indicate that another process had read it. I am running perl-5.8.8 on Solaris 8. The program writing to named pipe is a Java program which is writing to STDOUT. That program has been called using system by a Perl wrapper which has reopened STDOUT to the named pipe. The program reading from the named pipe is using PERLIO. I'm open to any hints, suggestions or solutions. Thanks for reading this far. Unless you just skipped to the bottom. -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net
Re: [OT] select and sysread problem on solaris
On 11 Sep 2008, at 02:12, Paul Johnson wrote: Is my assumption correct that if select tells you there is something to be read then there should be something there to be read? Can anyone think of any other possibilities? 210 + ~30 = ~240 - which is getting close to 255. Since select uses a fixed bit field to represent filenos there's an upper limit on the filenos you can select on. What does this print? #include sys/select.h #include stdio.h int main( void ) { printf( %d\n, FD_SETSIZE ); return 0; } -- Andy Armstrong, Hexten