Sigh... replying to my own post. I realized after the fact that I forgot to mention how this is relevant to SIGPIPE.
On Thu, Aug 13, 2009 at 7:30 AM, Mike Gerdts<[email protected]> wrote: > On Thu, Aug 13, 2009 at 4:23 AM, Brian Ruthven - Sun > UK<[email protected]> wrote: >> >> Once it's in that state, can you run the following please: >> >> # truss -Tgetpid -p <PID> >> # pstack <PID> >> # pfiles <PID> >> # prun <PID> >> >> The truss command will stop the process on a getpid call (confirm the S >> column has a 'T' in it with "/usr/bin/ps -opid,s,comm -p <PID>"), and then >> the pstack and pfiles will snapshot a point we're interested in. The prun >> will put the process back in the state it was in so you can kill / svcadm >> restart it. >> >> >> I found bug 5060500 which describes a possible cause of this. However, the >> fix for 6537549 added the signal(SIGPIPE SIG_IGN) to nscd, so this explains >> why the process doesn't die any more. >> >> Both bugs can be seen here: >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5060500 >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6537549 > > Perhaps related to this, I've had times where a misbehaving load > balancer in front of LDAP servers causes the lookup through nscd to > fail entirely rather than trying any additonal configured LDAP > servers. By fail entirely, I mean that getpw*() fails, causing ssh(1) > to say "You do not exist, go away!" After several tries, it seems to > grab the next LDAP server. I would expect that if one LDAP server > isn't able to respond to queries properly that nscd would try the next > one before returning a result or error to the caller. > > When I used ldapsearch I found that the load balancer was sending a > TCP RST while ldapsearch was waiting for a response to the bind > request. Packet traces show that in the successul ldapsearch session, > the TCP session is concluded in an orderly manner by the client > sending a packet with FIN set. In contrast, the unsuccessful > ldapsearch session shows the load balancer sends a packet with RST > set. And here's where it's relevant. The ldap client was trying to read from the established socket. When it received the RST, that generates a SIGPIPE. This causes ldapclient to die with no useful error message. It seems as though there may be general issues with they way that the LDAP libraries and/or programs built upon them handle unexpected connection termination. > > If anyone is interested in more detail, let me know. I think that I > might be able to reproduce this today. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ networking-discuss mailing list [email protected]
