Ken Cross wrote:
There is pretty much a one-to-one correspondence between the number of smbd processes open (i.e. connected users) and winbindd file descriptors (per fstat).
Hmm, it may be platform specific. smbd connects winbindd both directly and via NSS. On HP-UX it consumes two client pipes per smbd, and this might be due to linking libnss_winbind.1 with "-B symbolic", having symbols resolved locally, such that the two ways used by every smbd don't share client environment? It's just a guess.
I think forking would be counter-productive since winbindd caches so much stuff. A lot of it's in tdb files, though, so...
Agreed. And as far as 2.2 is concerned, a complete restructuring doesn't make sense any more. One can improve robustness and scalability with less modifications.
Frankly, I'm not sure what the smbd's are doing with winbindd after authentication.
I have observed that they are looking up uid->sid and gid->sid mappings very frequently. Yes, even if the Windows client seems really idle (user left desk and does not read or write anything) such lookups are triggered at least once per minute for every client. The frequency increases when a user is working actively. Plus lookups of name->sid, "user in group", and so on, but these are far less frequent. The latter ones can't be cached easily, but the set of id mappings an smbd comes across can.
There's another discussion looming about closing idle connections to winbindd.
Is this a separate discussion? I would consider them related :) Andrew Esh wrote:
Better yet: Have winbindd fork the same way smbd does, on a per-client basis. Someone should probably figure out what quality of the example network caused winbindd to consume so many sockets. Are there really that many requests being queued up at once? Shifting to a forking model would simply consume the same number and more processes. They are limited too.
Winbind's client connections persist as long as the client processes are alive, and are waiting for further requests. Clients never explicitly close the pipe, even if they had only one lookup during their lifetime. This is fine, as it makes clients submitting multiple requests in a row more efficient. And it gives winbindd the chance to store get??ent states in the winbindd connection environment. But if connection consumption gets excessive, why not look around for idle clients, and shut them down? (Note, "idle" does not mean the client is idle, just that it doesn't presently send requests to winbindd.)
We also need to be sure all the requests are making progress. If one gets hung, the client program would probably repeat the request, expending another instance of everything. Are there really 2048 users actively trying to make winbindd requests at the same time?
This is very unlikely, even more with smbds caching id mappings. And client requests are always processed, even if the client connection has been shut down. Client gets a broken pipe error on send and retries, opening a new connection. Shutting down another one that has been idle for a long time, if we are still at threshold. It is the same as when you restart winbindd while clients are alive. They just reconnect as soon as they are having a new request. Very gracefully :)
Perhaps this is the result of a very network-common failed NIS request, which falls through the passwd list in /etc/nsswitch.conf, and winds up asking winbindd about the same non-existent user. What is the content of the requests, and is there some way to fix the system so the users don't cause them to be issued at such a high rate? Should they even be forwarded to winbindd at all?
In such situations you will most probably see only one client connection carrying many requests (if there is one process failing on many users), or client connections popping up and going away rapidly (if there are many processes failing on one user each). Neither of them is a big problem for file descriptor consumption.
Maybe winbindd is piling up requests as it searches for a domain controller at the head of its "password server" list which is no longer working, or is no longer in DNS. Reorder that list, and winbindd might begin to process requests fast enough to stay ahead of the influx rate.
No, winbindd is working happily and rapidly (well, most of the time, and if it isn't permanently kept busy with id mapping lookups :). It's the unused socket file descriptors which pile up. They do not leak, but are presently unused. Cheers! Michael