Did you used the patch i did and i posted here and on bugzilla for bug #18756 ?
For your first issue, the problem should be that all memory allocation are not controlled in the code, i mean if it return NULL or an address to the memory, it's the same.
So maybe when you are at user 376, you allocated all the memory of the ldap_cache, and it's unable to give more, but it continue as it was ok.
I am actually working on this problem, this issue is maybe the source of the next problem too.
For issue 2, it seems to be a bad unlock when you reach the maximum user limit, i will look this too.
Jess Holle wrote:
I have been using Apache 2.0.47 on Windows with util_ldap and mod_auth_ldap for some time, but never heavily.
I populated an LDAP directory with thousands of users, placed a 25K static HTML page in a directory configured to "require valid-user" against the LDAP in questiono, and then did a simple test involving GET's for the given page as randomly selected users.
I quickly ran into an issue:
ISSUE 1: Every run Apache would crash when the 376th (distinct) user attempted to authenticate. This occured at that number regardless of how many requests had been made on behalf of each user.This *appeared* to be due to the size of my particular URLs, DNs, user names, etc, exceeding the physical memory allocated by 'LDAPSharedCacheSize' before the given number configured number of cache entries were exceeded. I was using the defaults for all LDAP cache parameters, i.e. I was not specifying anything in my conf files.
I say *appeared* as the crash would occur while trying to allocate memory for a new cache entry. I looked and it appears that none of the code appears to handle the case where the share memory block is exhausted (e.g. by gracefully not adding an entry to the cache and deallocating any portion of a new entry that has been allocated up too that point -- or by freeing older entries and retrying). I did some quick back-of-the-envelope estimates and decided that the default 100000 bytes may well have been used up. I computed that my entries appeared to be consuming about 267 bytes a piece if this was the case and increased LDAPSharedCacheSize accordingly for the number of distinct users making requests in the test.
Right or wrong, this approach worked great for a single-threaded test. I then tried 8 threads in parallel making requests. This worked great as well.
All of this was with the actual number of distinct users being less than the configured cache sizes, however. I then tried the tests with the number of distict users being larger than the cache sizes at which point I ran into a more serious issue:
ISSUE 2: Apache would consistently pause significantly on the 1025th distinct user (with the caches set to hold 1024, i.e. the default). Sometimes it would recover, but invariably it would completely deadlock shortly thereafter. Using the status page it appears that most times there is a deadlock on the first purge attempt and a deadlock always occurs on a purge request soon thereafter.This issue seemed independent of the cache limit, i.e. I could set the cache entry limit to 5 and Apache would deadlock after a few purges there as well (though a few seemed to occur rather quickly first whereas at 1024 purges seemed very slow even when they succeeded).
I say Apache was deadlocked in that all the threads that were processing connections were trying to establish a lock on the LDAP cache -- and none appeared to be successfully holding it. From a look at the code I infer that the author is trying to allow many readers with little synchronization but ensure at most one writer (and no readers during this writer's operation), but it does not seem to be quite working that way.
I ran into similar issues to this deadlock with Apache 1.3.28 (EAPI_MM) and auth_ldap 1.6.0 (with a shared cache) on Solaris 8.
I could not ellicit any such failure conditions from HP Apache 1.0.06.01 (their latest bundle of 2.0.46) but I've not tried these tests with Apache 2.0.x on other platforms.
Does anyone have any brilliant ideas here?
I'm planning to disable the util_ldap cache in all the problematic cases (i.e. on Windows until knowing the maximum number of users and on Apache 1.3.x as a whole) for now as I don't have that deep of an understanding of the code here -- short as it is -- and may not have the time to attain such an understanding for a while....
--
Jess Holle
