>>> John Jasen <[email protected]> schrieb am 20.07.2018 um 15:41 in Nachricht <[email protected]>:
> > On 07/20/2018 04:41 AM, [email protected] wrote: >> Hi! >> >> Stupid question: could it be your load-balancer that had a problem? >> How does the netstat look like (sockets opened, queued data, etc.?) > > I do not believe it to be the load-balancer. They log loss of contact > with the LDAP servers and drop them from the relay group shortly after > one of these events start; and when it gets cleaned up, they're added > back in. I also do not suspect network between the load balancers and > the LDAP servers. > > During such an event, ps -efT will usually show slapd running at full > thread capacity. Comparing that to threads in cn=monitor is not > possible, as those ldap searches fail. Still I don't know the internals on slapd, but could it runs out of worker threads? DO you monitor the theads' activity up to the problem? > > Open sockets does not substantially change until after the event > subsides. The servers will show 1200-2000 open sockets before an event, > and drop lower when it clears up -- to quickly scale back up to > pre-event levels. The other idea is to try to run "strace ... -p pid" on the hanging process to see what it is doing, or maybe even try to attach gdb to the process (most useful if the binary still contains debug info). > > The queues will show data being held until the socket(s) time out. OK, so it des not look like a problem in the load-balancer. Regards, Ulrich
