Re: [networking-discuss] nscd recieves SIGPIPE then consumes 100% CPU

Mike Gerdts Thu, 13 Aug 2009 06:12:26 -0700

Sigh... replying to my own post.  I realized after the fact that I
forgot to mention how this is relevant to SIGPIPE.


On Thu, Aug 13, 2009 at 7:30 AM, Mike Gerdts<[email protected]> wrote:
> On Thu, Aug 13, 2009 at 4:23 AM, Brian Ruthven - Sun
> UK<[email protected]> wrote:
>>
>> Once it's in that state, can you run the following please:
>>
>> # truss -Tgetpid -p <PID>
>> # pstack <PID>
>> # pfiles <PID>
>> # prun <PID>
>>
>> The truss command will stop the process on a getpid call (confirm the S
>> column has a 'T' in it with "/usr/bin/ps -opid,s,comm -p <PID>"), and then
>> the pstack and pfiles will snapshot a point we're interested in. The prun
>> will put the process back in the state it was in so you can kill / svcadm
>> restart it.
>>
>>
>> I found bug 5060500 which describes a possible cause of this. However, the
>> fix for 6537549 added the signal(SIGPIPE SIG_IGN) to nscd, so this explains
>> why the process doesn't die any more.
>>
>> Both bugs can be seen here:
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5060500
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6537549
>
> Perhaps related to this, I've had times where a misbehaving load
> balancer in front of LDAP servers causes the lookup through nscd to
> fail entirely rather than trying any additonal configured LDAP
> servers.  By fail entirely, I mean that getpw*() fails, causing ssh(1)
> to say "You do not exist, go away!"  After several tries, it seems to
> grab the next LDAP server.  I would expect that if one LDAP server
> isn't able to respond to queries properly that nscd would try the next
> one before returning a result or error to the caller.
>
> When I used ldapsearch I found that the load balancer was sending a
> TCP RST while ldapsearch was waiting for a response to the bind
> request.  Packet traces show that in the successul ldapsearch session,
> the TCP session is concluded in an orderly manner by the client
> sending a packet with FIN set. In contrast, the unsuccessful
> ldapsearch session shows the load balancer sends a packet with RST
> set.

And here's where it's relevant.  The ldap client was trying to read
from the established socket.  When it received the RST, that generates
a SIGPIPE.  This causes ldapclient to die with no useful error
message.  It seems as though there may be general issues with they way
that the LDAP libraries and/or programs built upon them handle
unexpected connection termination.

>
> If anyone is interested in more detail, let me know.  I think that I
> might be able to reproduce this today.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] nscd recieves SIGPIPE then consumes 100% CPU

Reply via email to