Ah, yeah, my tests had krb5kdc at about 50% of one core (slapd was an 
additional 15%), but it wasn't completely saturating the machine.

Glad the patch fixed it!

Chris

On 2011/08/09 07:13, Jonathan Reams wrote:
> Chris,
>
> We didn't actually see any problems either until the KDC was under heavy 
> load. The unpatched version of 1.9.1 was and still is running on our 
> secondary KDC without issue, and we had been using 1.9.1 in testing and 
> development for months without issue as well. During the period where we saw 
> the performance degradation, the primary KDC handled 467000 distinct AS/TGS 
> requests. Which means the KDC was handling roughly 43 requests per second 
> (not counting lots of retransmits). That is typical of our primary production 
> KDC's workload throughout the day, but we don't have any other KDC that gets 
> that amount of traffic; by contrast, our secondary KDC gets a request once or 
> twice a minute. So it would seem the performance problem only really comes 
> into play when the KDC is under heavy load.
>
> Jonathan
>
> On Aug 9, 2011, at 4:23 AM, Chris Hecker wrote:
>
>>
>> Just another data point:  I'm not seeing this on my locally built (but
>> not with the attached patch) 1.9.1:
>>
>> real    0m41.409s
>> user    0m3.358s
>> sys     0m3.683s
>> finished round 1
>>
>> real    0m35.036s
>> user    0m3.441s
>> sys     0m3.658s
>> finished round 2
>>
>> real    0m44.344s
>> user    0m3.363s
>> sys     0m3.728s
>> finished round 3
>>
>> real    0m40.930s
>> user    0m3.465s
>> sys     0m3.973s
>> finished round 4
>>
>> I had to reduce the number of inner iterations to 300 because my machine
>> is slow.  The variance in the above numbers is because there's a bunch
>> of stuff running on this machine.
>>
>> Chris
>>
>> On 2011/08/08 11:21, Greg Hudson wrote:
>>> On Mon, 2011-08-08 at 11:22 -0400, Jonathan Reams wrote:
>>>> I did some performance testing on our test KDC and was able to
>>>> reproduce the performance issue with 1.9.1.
>>>
>>> I found a regression which would affect these tests, but I'm not sure it
>>> accounts for your global performance issues.
>>>
>>> The KDC in krb5 1.9 isn't supposed to be using an on-disk replay cache,
>>> but due to a bug, it is actually opening and reading a replay cache for
>>> every TGS request, which is significantly less efficient than the 1.8
>>> behavior (using a replay cache which stays open for the lifetime of the
>>> KDC).
>>>
>>> In a test which runs in under five minutes, this regression produces
>>> visible O(n^2) performance characteristics.  This would not necessarily
>>> account for performance degradation over hours, as the performance drag
>>> of the replay cache should become stable after five minutes.  It's
>>> possible that the constant drag was enough to cause the KDC to fall
>>> behind on the request load, but it's also possible that there's a second
>>> problem which isn't so easily reproduced.
>>>
>>> I've attached a patch.  Note that there is a second, in-memory
>>> "lookaside" cache with O(n^2) performance characteristics in the short
>>> term, which holds queries for up to two minutes.  You may see a slight
>>> degradation in performance in test cases due to this.  You can
>>> temporarily rebuild the kdc directory with "make clean;
>>> CPPFLAGS=-DNOCACHE" if you want to remove this variable from your
>>> performance tests.
>>>
>>>
>>>
>>>
>>> ________________________________________________
>>> Kerberos mailing list           Kerberos@mit.edu
>>> https://mailman.mit.edu/mailman/listinfo/kerberos
>> ________________________________________________
>> Kerberos mailing list           Kerberos@mit.edu
>> https://mailman.mit.edu/mailman/listinfo/kerberos
>>
>
>
> ________________________________________________
> Kerberos mailing list           Kerberos@mit.edu
> https://mailman.mit.edu/mailman/listinfo/kerberos
>
________________________________________________
Kerberos mailing list           Kerberos@mit.edu
https://mailman.mit.edu/mailman/listinfo/kerberos

Reply via email to