Ah, yeah, my tests had krb5kdc at about 50% of one core (slapd was an additional 15%), but it wasn't completely saturating the machine.
Glad the patch fixed it! Chris On 2011/08/09 07:13, Jonathan Reams wrote: > Chris, > > We didn't actually see any problems either until the KDC was under heavy > load. The unpatched version of 1.9.1 was and still is running on our > secondary KDC without issue, and we had been using 1.9.1 in testing and > development for months without issue as well. During the period where we saw > the performance degradation, the primary KDC handled 467000 distinct AS/TGS > requests. Which means the KDC was handling roughly 43 requests per second > (not counting lots of retransmits). That is typical of our primary production > KDC's workload throughout the day, but we don't have any other KDC that gets > that amount of traffic; by contrast, our secondary KDC gets a request once or > twice a minute. So it would seem the performance problem only really comes > into play when the KDC is under heavy load. > > Jonathan > > On Aug 9, 2011, at 4:23 AM, Chris Hecker wrote: > >> >> Just another data point: I'm not seeing this on my locally built (but >> not with the attached patch) 1.9.1: >> >> real 0m41.409s >> user 0m3.358s >> sys 0m3.683s >> finished round 1 >> >> real 0m35.036s >> user 0m3.441s >> sys 0m3.658s >> finished round 2 >> >> real 0m44.344s >> user 0m3.363s >> sys 0m3.728s >> finished round 3 >> >> real 0m40.930s >> user 0m3.465s >> sys 0m3.973s >> finished round 4 >> >> I had to reduce the number of inner iterations to 300 because my machine >> is slow. The variance in the above numbers is because there's a bunch >> of stuff running on this machine. >> >> Chris >> >> On 2011/08/08 11:21, Greg Hudson wrote: >>> On Mon, 2011-08-08 at 11:22 -0400, Jonathan Reams wrote: >>>> I did some performance testing on our test KDC and was able to >>>> reproduce the performance issue with 1.9.1. >>> >>> I found a regression which would affect these tests, but I'm not sure it >>> accounts for your global performance issues. >>> >>> The KDC in krb5 1.9 isn't supposed to be using an on-disk replay cache, >>> but due to a bug, it is actually opening and reading a replay cache for >>> every TGS request, which is significantly less efficient than the 1.8 >>> behavior (using a replay cache which stays open for the lifetime of the >>> KDC). >>> >>> In a test which runs in under five minutes, this regression produces >>> visible O(n^2) performance characteristics. This would not necessarily >>> account for performance degradation over hours, as the performance drag >>> of the replay cache should become stable after five minutes. It's >>> possible that the constant drag was enough to cause the KDC to fall >>> behind on the request load, but it's also possible that there's a second >>> problem which isn't so easily reproduced. >>> >>> I've attached a patch. Note that there is a second, in-memory >>> "lookaside" cache with O(n^2) performance characteristics in the short >>> term, which holds queries for up to two minutes. You may see a slight >>> degradation in performance in test cases due to this. You can >>> temporarily rebuild the kdc directory with "make clean; >>> CPPFLAGS=-DNOCACHE" if you want to remove this variable from your >>> performance tests. >>> >>> >>> >>> >>> ________________________________________________ >>> Kerberos mailing list Kerberos@mit.edu >>> https://mailman.mit.edu/mailman/listinfo/kerberos >> ________________________________________________ >> Kerberos mailing list Kerberos@mit.edu >> https://mailman.mit.edu/mailman/listinfo/kerberos >> > > > ________________________________________________ > Kerberos mailing list Kerberos@mit.edu > https://mailman.mit.edu/mailman/listinfo/kerberos > ________________________________________________ Kerberos mailing list Kerberos@mit.edu https://mailman.mit.edu/mailman/listinfo/kerberos