[ Oops, forgot to cc: the list. -Chris ] Date: Wed, 20 Dec 2006 18:48:19 -0800 From: Chris Dunlap <[EMAIL PROTECTED]> To: Jeff Squyres <[EMAIL PROTECTED]> Subject: Re: [munge-users] munged running at 100%
I'm glad you were able to track this down. I hadn't given much thought to very large passwd/group files, so MUNGE will be better off now having this be tunable. And it helped exhibit a similar issue in OSCAR where very large passwd/group files weren't handled gracefully. In the getgrent() loop in _gids_hash_create(), I replaced the call to getpwnam() with _gids_user_to_uid() which caches the uid lookup in a second hash. I was just curious to know how much of an effect the uid caching had on very large passwd/group files, especially compared to the 27min it took 0.5.6 based on the initial log. -Chris On Wed, 12/20/06 09:00p EST, Jeff Squyres wrote: > > On Dec 20, 2006, at 3:15 PM, Chris Dunlap wrote: > > > I'll release these changes in 0.5.7 either today or tomorrow. > > Awesome. > > > Out of curiosity, how long does it take your back-end node to compute > > the gids map now that I'm caching the uid lookups? > > I'm not sure what you're asking me to time here -- the execution of > the _gids_hash_create() function? I don't see caching going on in > there, so I suspect you mean something else... > > FWIW, I found out that my cluster installer software (OSCAR) was > overwriting my /etc/group and /etc/passwd files on all my back-end > nodes every 15 minutes. And it was overwriting them with the output > from "getent group" / "getenv passwd" on my cluster head node. This > resulted in group / passwd files that contained the original skeleton > group/passwd file *and* all the entries from NIS. > > getgrent(), therefore, saw the entire group/passwd files *and* all > the NIS entries. That is, it essentially saw every entry from NIS > listed twice (once in the file, and once in NIS). Yoinks. I > performed the following test to check the performance differences: > > 0. I disabled OSCAR's refreshing of /etc/group and /etc/passwd. > > 1. Write a short C program that essentially duplicated the getgrent() > loop in gids.c that also showed timing information. > > 2. With a "full" /etc/passwd and /etc/group on a back-end compute > node (i.e., what OSCAR put there as a result of getent(1)), run the > test program. It completed in 606 seconds. > > 3. Replace /etc/passwd and /etc/group with the skeletal versions that > they are supposed to be (i.e., no duplicated entries from NIS) and > run the test program. It completed in 17 seconds. > > 4. Additionally, running munged with all the default settings (i.e., > letting it generate the gid map) when /etc/passwd and etc/group had > only the skeletal entries only showed significant CPU load for the > first 15-20 seconds of its execution, and then it went to a CPU load > of 0. This assumedly jives well with #3; the initial munged load is > when its building the initial gid map. I did not re-run the munged > with "full" versions of passwd/group, but I assume that, per #2, I'd > see heavy load from munged for about 600 seconds. If OSCAR refreshed > the /etc/group file, munged's timer would expire a short time later > and it would simply turn around and re-create the gid map again. > Otherwise, this initial heavy load would be a one-time expense and > munged's CPU load would go to 0. > > I conclude that having these "wrong" passwd/group files does two things: > > 1. Make the getgrent() process take much longer > 2. Make the getgrent() process much more CPU-intensive > > #1 is possibly as result of #2, but there are probably other factors > involved (perhaps when the majority of the entries are in NIS, most > of the work is blocking on network access, not local CPU processing, > etc...?). > > So -- in short -- I think this whole hubaloo was due to OSCAR > inadvertently doing the wrong thing in an NIS environment. Not > munge's fault at all. I'll be mailing the OSCAR list with details > shortly... _______________________________________________ munge-users mailing list [email protected] https://mail.gna.org/listinfo/munge-users
