Re: Apache 2.0.51 util_ldap
I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, Actually on NetWare this is a non-issue. On NetWare everything is global (memory, locks, etc.), so there is no difference between a global mutex and a local one (other than previously we were using reader/writer locks rather than mutexes). I would like to use reader/writer locks rather than global mutexes simply for performance reasons, but I'm not sure how we would go about switching between global and local locks anyway. This would require #ifdef'ing the code for particular platforms or MPM's which isn't a good thing. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com [EMAIL PROTECTED] Sunday, September 19, 2004 3:43:20 PM Sorry for the chattiness of my solution process. I've tested and these fixes do apply with the global mutex changes *except* when one disables caches by sizing them all to 0, Apache will crash on the first authentication request when the global mutexes are used! This needs to be fixed! I've attached a unified diff containing the purge fix and the unassigned variable fix (which as Graham pointed out is already in the 2.1 branch). I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, why do we use shared memory on these platforms for the cache? [If I'm just daft here, I apologize.] -- Jess Holle Jess Holle wrote: Here's a fixed LDAP purge routine which works great in my testing (with cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins repeated 3 times each). [No, I haven't produced a diff as I have pieces of util_ldap from various CVS levels at this point.] Essentially I added all the logic surrounding 'pp', which is the address of the previous node's 'next' field or of cache-nodes[i] in the case of the first node. [Cleary my C is getting rusty -- this took me a few attempts to get right...] This fixes the biggest LDAP module issue I'm aware of: hangs and crashes after one or more cache purges.
Re: Apache 2.0.51 util_ldap
Brad Nicholes wrote: I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, Actually on NetWare this is a non-issue. On NetWare everything is global (memory, locks, etc.), so there is no difference between a global mutex and a local one (other than previously we were using reader/writer locks rather than mutexes). I would like to use reader/writer locks rather than global mutexes simply for performance reasons, but I'm not sure how we would go about switching between global and local locks anyway. This would require #ifdef'ing the code for particular platforms or MPM's which isn't a good thing. On the contrary, if our #ifdef'ing is localized, I believe that doing this on a per-platform or per-MPM (or better yet a #ifdef HAS_MULTIPLE_WORKER_PROCESSES or some such) basis to maximize performance would be worthwhile. To scatter such #ifdef's throughout the whole module would be a maintenance nightmare, of course. -- Jess Holle
Re: Apache 2.0.51 util_ldap
At 01:33 PM 9/20/2004, Jess Holle wrote: Brad Nicholes wrote: I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, Actually on NetWare this is a non-issue. On NetWare everything is global (memory, locks, etc.), so there is no difference between a global mutex and a local one (other than previously we were using reader/writer locks rather than mutexes). It's similar for Win32 - except single process can be implemented as critical sections. On the contrary, if our #ifdef'ing is localized, I believe that doing this on a per-platform or per-MPM (or better yet a #ifdef HAS_MULTIPLE_WORKER_PROCESSES or some such) basis to maximize performance would be worthwhile. -1 Veto (not a vote) to test platforms. However, ap_mpm_query() will let you determine if you are running on a single or multi-process mpm, a threaded or non-threaded mpm, etc. If you want to test mpm behavior and make selections based on those characteristics, I'd see no issues with that. To scatter such #ifdef's throughout the whole module would be a maintenance nightmare, of course. Exactly. Who two years from now will be able to follow the code? httpd-1.3 was abandoned (effectively replaced) because of main(). At some point, you break down and create two seperate modules for different conditions, witness mod_cgi v.s. mod_cgid. Bill
Re: Apache 2.0.51 util_ldap
Even after all my patches, I still get a bus error and core dump on the first LDAP authentication request on Solaris 8 with worker MPM and an active shared memory LDAP cache. [This is with iPlanet LDAP SDK 5.08, though I doubt that matters.] I've run out of time to look into this further. Moreover, I am leaning towards very few Apache child processes with many threads. Thus I believe that having each process have its own cache with read/write locks is a better strategy for this arrangement and will just #if out the shared memory and move on. -- Jess Holle
Re: Apache 2.0.51 util_ldap
William A. Rowe, Jr. wrote: At 01:33 PM 9/20/2004, Jess Holle wrote: Brad Nicholes wrote: I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, Actually on NetWare this is a non-issue. On NetWare everything is global (memory, locks, etc.), so there is no difference between a global mutex and a local one (other than previously we were using reader/writer locks rather than mutexes). It's similar for Win32 - except single process can be implemented as critical sections. On the contrary, if our #ifdef'ing is localized, I believe that doing this on a per-platform or per-MPM (or better yet a #ifdef HAS_MULTIPLE_WORKER_PROCESSES or some such) basis to maximize performance would be worthwhile. -1 Veto (not a vote) to test platforms. However, ap_mpm_query() will let you determine if you are running on a single or multi-process mpm, a threaded or non-threaded mpm, etc. If you want to test mpm behavior and make selections based on those characteristics, I'd see no issues with that. Same basic idea, but a (much) better implementation. Sounds great to me. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: Okay, the cause of this issue is now clear: util_ald_create_caches() does not set 'newcurl' to anything when any of the caches are null, which they all are when they're sized at zero. The fix is also simple: add an 'else newcurl = NULL;' after the 'if' block in this routine. Will fix (if nobody else has yet) - thanks for hunting this down. [This really drives home why I have developed in Java for the last 5 years after spending 7+ years doing C and C++. This issue could not have occured in Java -- the compiler would have rejected the issue. I'm not saying the extra speed, etc, due to Apache being written in C is not nice. Nor due I wish to start some holy war. It's just that the lack of pointer / memory allocation issues, uninitialized variables, and not having to produce one's own APR to deal with platforms make Java a much more productive place for me.] I am the same - if the job is either time constrained, or accuracy constrained, then I stick to Java. C is still king for apps where speed is an issue, but then that's at the cost of your hair sometimes. Regards, Graham --
Re: Apache 2.0.51 util_ldap
Graham Leggett wrote: Jess Holle wrote: Okay, the cause of this issue is now clear: util_ald_create_caches() does not set 'newcurl' to anything when any of the caches are null, which they all are when they're sized at zero. The fix is also simple: add an 'else newcurl = NULL;' after the 'if' block in this routine. Will fix (if nobody else has yet) - thanks for hunting this down. Just a note: this was enough to fix the problem without the global mutexes present -- I've not tested again with the mutexes present as I want to get to the bottom of the cache overflow crashes first if I can. The fix should still go in anyway as leaving this uninitialized will only lead to awful problems downstream. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: Just a note: this was enough to fix the problem without the global mutexes present -- I've not tested again with the mutexes present as I want to get to the bottom of the cache overflow crashes first if I can. The fix should still go in anyway as leaving this uninitialized will only lead to awful problems downstream. It's already been committed to v2.1, but I see no backport vote yet for v2.0, should I add the backport request? Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Apache 2.0.51 util_ldap
Graham Leggett wrote: Jess Holle wrote: Just a note: this was enough to fix the problem without the global mutexes present -- I've not tested again with the mutexes present as I want to get to the bottom of the cache overflow crashes first if I can. The fix should still go in anyway as leaving this uninitialized will only lead to awful problems downstream. It's already been committed to v2.1, but I see no backport vote yet for v2.0, should I add the backport request? Yes, please. -- Jess Holle
Re: Apache 2.0.51 util_ldap
I now see what's wrong with the LDAP cache purge -- it does not fix up the 'next' pointers and/or cache-node[i] pointers when removing entries -- and thus cannot hope to work. Unfortunately, my fixes for this are still falling short, but I thought I'd pass this along. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: I now see what's wrong with the LDAP cache purge -- it does not fix up the 'next' pointers and/or cache-node[i] pointers when removing entries -- and thus cannot hope to work. Unfortunately, my fixes for this are still falling short, but I thought I'd pass this along. Looking at the previous fix, some of these problems seem to be fixed in HEAD. There was a batch of changes to the LDAP stuff that depended on v1.0 of APR, and were thus not backported to httpd v2.0. There may be some value in checking the diffs between HEAD and 2.0 to see what changes have been made - I think there are some bugfixes in there that need porting. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Apache 2.0.51 util_ldap
Graham Leggett wrote: Jess Holle wrote: I now see what's wrong with the LDAP cache purge -- it does not fix up the 'next' pointers and/or cache-node[i] pointers when removing entries -- and thus cannot hope to work. Unfortunately, my fixes for this are still falling short, but I thought I'd pass this along. Looking at the previous fix, some of these problems seem to be fixed in HEAD. There was a batch of changes to the LDAP stuff that depended on v1.0 of APR, and were thus not backported to httpd v2.0. There may be some value in checking the diffs between HEAD and 2.0 to see what changes have been made - I think there are some bugfixes in there that need porting. Thanks for the tip. The purge() routine is still not fixed in HEAD, though... -- Jess Holle
Re: Apache 2.0.51 util_ldap
Another dumb question: On Windows since there is only one child process, wouldn't it make sense to stick with the read/write locks and not move to a global mutex? In the multi-child mpms, the global mutex is obviously required, of course. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Here's a fixed LDAP purge routine which works great in my testing (with cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins repeated 3 times each). [No, I haven't produced a diff as I have pieces of util_ldap from various CVS levels at this point.] Essentially I added all the logic surrounding 'pp', which is the address of the previous node's 'next' field or of cache-nodes[i] in the case of the first node. [Cleary my C is getting rusty -- this took me a few attempts to get right...] This fixes the biggest LDAP module issue I'm aware of: hangs and crashes after one or more cache purges. -- Jess Holle void util_ald_cache_purge(util_ald_cache_t *cache) { unsigned long i; util_cache_node_t *p, *q, **pp; apr_time_t t; if (!cache) return; cache-last_purge = apr_time_now(); cache-npurged = 0; cache-numpurges++; for (i=0; i cache-size; ++i) { pp = cache-nodes + i; p = *pp; while (p != NULL) { if (p-add_time cache-marktime) { q = p-next; (*cache-free)(cache, p-payload); util_ald_free(cache, p); cache-numentries--; cache-npurged++; p = *pp = q; } else { pp = (p-next); p = *pp; } } } t = apr_time_now(); cache-avg_purgetime = ((t - cache-last_purge) + (cache-avg_purgetime * (cache-numpurges-1))) / cache-numpurges; } Jess Holle wrote: Graham Leggett wrote: Jess Holle wrote: I now see what's wrong with the LDAP cache purge -- it does not fix up the 'next' pointers and/or cache-node[i] pointers when removing entries -- and thus cannot hope to work. Unfortunately, my fixes for this are still falling short, but I thought I'd pass this along. Looking at the previous fix, some of these problems seem to be fixed in HEAD. There was a batch of changes to the LDAP stuff that depended on v1.0 of APR, and were thus not backported to httpd v2.0. There may be some value in checking the diffs between HEAD and 2.0 to see what changes have been made - I think there are some bugfixes in there that need porting. Thanks for the tip. The purge() routine is still not fixed in HEAD, though... -- Jess Holle
Re: Apache 2.0.51 util_ldap
Sorry for the chattiness of my solution process. I've tested and these fixes do apply with the global mutex changes *except* when one disables caches by sizing them all to 0, Apache will crash on the first authentication request when the global mutexes are used! This needs to be fixed! I've attached a unified diff containing the purge fix and the unassigned variable fix (which as Graham pointed out is already in the 2.1 branch). I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, why do we use shared memory on these platforms for the cache? [If I'm just daft here, I apologize.] -- Jess Holle Jess Holle wrote: Here's a fixed LDAP purge routine which works great in my testing (with cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins repeated 3 times each). [No, I haven't produced a diff as I have pieces of util_ldap from various CVS levels at this point.] Essentially I added all the logic surrounding 'pp', which is the address of the previous node's 'next' field or of cache-nodes[i] in the case of the first node. [Cleary my C is getting rusty -- this took me a few attempts to get right...] This fixes the biggest LDAP module issue I'm aware of: hangs and crashes after one or more cache purges. --- util_ldap_cache_mgr.c-2.0.512004-09-19 16:28:00.0 -0500 +++ util_ldap_cache_mgr.c 2004-09-19 16:27:56.0 -0500 @@ -173,7 +173,7 @@ void util_ald_cache_purge(util_ald_cache_t *cache) { unsigned long i; -util_cache_node_t *p, *q; +util_cache_node_t *p, *q, **pp; apr_time_t t; if (!cache) @@ -184,7 +184,8 @@ cache-numpurges++; for (i=0; i cache-size; ++i) { -p = cache-nodes[i]; +pp = cache-nodes + i; +p = *pp; while (p != NULL) { if (p-add_time cache-marktime) { q = p-next; @@ -192,10 +193,11 @@ util_ald_free(cache, p); cache-numentries--; cache-npurged++; -p = q; +p = *pp = q; } else { -p = p-next; +pp = (p-next); +p = *pp; } } } @@ -252,6 +254,8 @@ newcurl = util_ald_cache_insert(st-util_ldap_cache, curl); } +else + newcurl = NULL; return newcurl; }
Re: Apache 2.0.51 util_ldap
Here's one final patch to fix the global mutex crash when the global mutex is never allocated due to disabled/empty caches. I would really like some clarity as to whether: We should just stick with the single-process read/write lock for single-worker MPMs. It would really seem so. Whether we should really avoid using shared memory for the LDAP cache for single-worker MPMs. What's it really buy us in this case? Given the patches from today and answers (and code adjustments as appropriate) to those 2 questions, I will feel much better about the LDAP modules as they now seem pretty stable for a wide range of settings/circumstances -- at least on Windows. I now need to test these on Solaris and AIX... -- Jess Holle Jess Holle wrote: Sorry for the chattiness of my solution process. I've tested and these fixes do apply with the global mutex changes *except* when one disables caches by sizing them all to 0, Apache will crash on the first authentication request when the global mutexes are used! This needs to be fixed! I've attached a unified diff containing the purge fix and the unassigned variable fix (which as Graham pointed out is already in the 2.1 branch). I'm still wondering if we shouldn't just stick with the local read/write lock on Windows and other single child MPMs (NetWare?) as this should allow better throughput in such cases and yet be safe, right? In fact, why do we use shared memory on these platforms for the cache? [If I'm just daft here, I apologize.] -- Jess Holle Jess Holle wrote: Here's a fixed LDAP purge routine which works great in my testing (with cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins repeated 3 times each). [No, I haven't produced a diff as I have pieces of util_ldap from various CVS levels at this point.] Essentially I added all the logic surrounding 'pp', which is the address of the previous node's 'next' field or of cache-nodes[i] in the case of the first node. [Cleary my C is getting rusty -- this took me a few attempts to get right...] This fixes the biggest LDAP module issue I'm aware of: hangs and crashes after one or more cache purges. --- util_ldap_cache_mgr.c-2.0.51 2004-09-19 16:28:00.0 -0500 +++ util_ldap_cache_mgr.c 2004-09-19 16:27:56.0 -0500 @@ -173,7 +173,7 @@ void util_ald_cache_purge(util_ald_cache_t *cache) { unsigned long i; -util_cache_node_t *p, *q; +util_cache_node_t *p, *q, **pp; apr_time_t t; if (!cache) @@ -184,7 +184,8 @@ cache-numpurges++; for (i=0; i cache-size; ++i) { -p = cache-nodes[i]; +pp = cache-nodes + i; +p = *pp; while (p != NULL) { if (p-add_time cache-marktime) { q = p-next; @@ -192,10 +193,11 @@ util_ald_free(cache, p); cache-numentries--; cache-npurged++; -p = q; +p = *pp = q; } else { -p = p-next; +pp = (p-next); +p = *pp; } } } @@ -252,6 +254,8 @@ newcurl = util_ald_cache_insert(st-util_ldap_cache, curl); } +else + newcurl = NULL; return newcurl; } --- util_ldap.c-2.0.51 2004-09-19 17:11:02.0 -0500 +++ util_ldap.c-new 2004-09-19 17:11:06.0 -0500 @@ -89,9 +89,11 @@ #endif #define LDAP_CACHE_LOCK() \ -apr_global_mutex_lock(st-util_ldap_cache_lock) +if (st-util_ldap_cache_lock) \ + apr_global_mutex_lock(st-util_ldap_cache_lock) #define LDAP_CACHE_UNLOCK() \ -apr_global_mutex_unlock(st-util_ldap_cache_lock) +if (st-util_ldap_cache_lock) \ + apr_global_mutex_unlock(st-util_ldap_cache_lock) static void util_ldap_strdup (char **str, const char *newstr)
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: Here's one final patch to fix the global mutex crash when the global mutex is never allocated due to disabled/empty caches. I would really like some clarity as to whether: We should just stick with the single-process read/write lock for single-worker MPMs. It would really seem so. Whether we should really avoid using shared memory for the LDAP cache for single-worker MPMs. What's it really buy us in this case? Related stupid questions: Does setting LDAPSharedCacheSize to 0 disable shared memory but not the cache? [The docs say so.] If so, then wouldn't we want to use per-process read/write locks in this case and global mutexes only when shared memory was actually being used? -- Jess Holle
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: Jess Holle wrote: Here's one final patch to fix the global mutex crash when the global mutex is never allocated due to disabled/empty caches. I would really like some clarity as to whether: We should just stick with the single-process read/write lock for single-worker MPMs. It would really seem so. Whether we should really avoid using shared memory for the LDAP cache for single-worker MPMs. What's it really buy us in this case? Related stupid questions: Does setting LDAPSharedCacheSize to 0 disable shared memory but not the cache? [The docs say so.] P.S. The doc says so, but the ldap-status handler provides no information in this case as if the cache were not active. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Sorry. One last post... It seems that at least on Windows if I place Location /server/cache-info SetHandler ldap-status /Location too early in the configuration Apache crashes on shutdown. Specifically, if I place it in front of my mod_deflate configuration (similar to that in documentation), it will crash on shutdown with: apr_rmm_free(apr_rmm_t * 0x0091d898, unsigned int 12583060) line 375 util_ald_free(util_ald_cache * 0x00c00020, const void * 0x00c00094) line 82 + 19 bytes util_ald_destroy_cache(util_ald_cache * 0x6eec8526) line 352 run_cleanups(cleanup_t * * 0x0026a960) line 1952 apr_pool_destroy(apr_pool_t * 0x6ff1e82b) line 733 ap_mpm_run(apr_pool_t * 0x002689d8, apr_pool_t * 0x, server_rec * 0x70a9f1ab) line 1645 main(int 1890185643, const char * const * 0x8002) line 624 + 8 bytes SHLWAPI! 70a9f1ab() ff50dc45() Jess Holle wrote: Jess Holle wrote: Jess Holle wrote: Here's one final patch to fix the global mutex crash when the global mutex is never allocated due to disabled/empty caches. I would really like some clarity as to whether: We should just stick with the single-process read/write lock for single-worker MPMs. It would really seem so. Whether we should really avoid using shared memory for the LDAP cache for single-worker MPMs. What's it really buy us in this case? Related stupid questions: Does setting LDAPSharedCacheSize to 0 disable shared memory but not the cache? [The docs say so.] P.S. The doc says so, but the ldap-status handler provides no information in this case as if the cache were not active. -- Jess Holle
Re: Apache 2.0.51 util_ldap
I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when the cache is disabled by zero-sizing everything. Therefore APR fixes, etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not change). I now plan to look for source versions between 2.0.50 and 2.0.51 that provide improvements without these regressions. -- Jess Holle Jess Holle wrote: One small correction: When I remove the global mutex stuff I no longer have the case where both the worker and parent processes crash, so that's another improvement on Windows. Unfortunately, I still have the case where Apache hangs, however. -- Jess Holle Jess Holle wrote: Working on a wild hunch, I backed util_ldap source down to right before the global mutex stuff went in -- as that should not be necessary with a single child process anyway, right? This fixed the crash on shutdown -- but that's all. I'm going to try the 2.0.50 util_ldap sources with everything else from 2.0.51 as well. Else I might have to go back to 2.0.50 plus security fixes as you suggest. And that's still not even trying the worker mpm on Solaris -- which at least used to have worse behavior than Windows in this area. -- Jess Holle Jeff Trawick wrote: one possibility is to apply the security patches you need to 2.0.50 see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/ the descriptions of the vulnerabilities at http://httpd.apache.org/ indicate which components are affected; note that CAN-2004-0786 applies to all configurations; I have seen a suggestion that it affects IPv6 setups only, but that is not the case
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when the cache is disabled by zero-sizing everything. Therefore APR fixes, etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not change). Is it possible to get a stack trace from the crash? Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when the cache is disabled by zero-sizing everything. Therefore APR fixes, etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not change). I now plan to look for source versions between 2.0.50 and 2.0.51 that provide improvements without these regressions. -- Jess Holle Rolling back to version 1.3.2.11 of util_ldap_cache_mgr.c seems to fix the hang (which seems odd to me...). The crash on startup with 0-sized cache appears to be related to a missing null check in the duplicate entry prevention fixes. I'll look into that more. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Here you go: () util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84) line 358 + 12 bytes util_ldap_cache_checkuserid(request_rec * 0x6fb51341, util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const char * 0x00880db0, int 9487736, char * * 0x0002, const char * 0x, const char * 0x04c6def4, const char * * 0x00a5eede, const char * * * 0x04c6dee8) line 785 + 22 bytes mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333 LIBHTTPD! 6ff110bf() Note that none of the line numbers quite match as I've added comments in my source. Thus the util_ald_cache_fetch() line is: hashval = (*cache-hash)(payload) % cache-size; While the util_ldap_cache_checkuserid() line is: search_nodep = util_ald_cache_fetch(curl-search_cache, the_search_node); I was just about to patch around this by check cache-hash for null and returning null in this case from util_ald_cache_fetch(), but I'm all ears for a better fix. I'm also all ears for a fix to the hang -- perhaps I can cull out a stack dump for that too... -- Jess Holle Graham Leggett wrote: Jess Holle wrote: I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when the cache is disabled by zero-sizing everything. Therefore APR fixes, etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not change). Is it possible to get a stack trace from the crash? Regards, Graham --
Re: Apache 2.0.51 util_ldap
Note the stack trace below was generated with 2.0.51's global mutex changes removed. A crash still occurs with a zero sized cache with the global mutex changes in place, but I believe it is from a null mutex, not a null cache hash function entry. -- Jess Holle Jess Holle wrote: Here you go: () util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84) line 358 + 12 bytes util_ldap_cache_checkuserid(request_rec * 0x6fb51341, util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const char * 0x00880db0, int 9487736, char * * 0x0002, const char * 0x, const char * 0x04c6def4, const char * * 0x00a5eede, const char * * * 0x04c6dee8) line 785 + 22 bytes mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333 LIBHTTPD! 6ff110bf() Note that none of the line numbers quite match as I've added comments in my source. Thus the util_ald_cache_fetch() line is: hashval = (*cache-hash)(payload) % cache-size; While the util_ldap_cache_checkuserid() line is: search_nodep = util_ald_cache_fetch(curl-search_cache, the_search_node); I was just about to patch around this by check cache-hash for null and returning null in this case from util_ald_cache_fetch(), but I'm all ears for a better fix. I'm also all ears for a fix to the hang -- perhaps I can cull out a stack dump for that too... -- Jess Holle Graham Leggett wrote: Jess Holle wrote: I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when the cache is disabled by zero-sizing everything. Therefore APR fixes, etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not change). Is it possible to get a stack trace from the crash? Regards, Graham --
Re: Apache 2.0.51 util_ldap
Jess Holle wrote: Here you go: () util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84) line 358 + 12 bytes util_ldap_cache_checkuserid(request_rec * 0x6fb51341, util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const char * 0x00880db0, int 9487736, char * * 0x0002, const char * 0x, const char * 0x04c6def4, const char * * 0x00a5eede, const char * * * 0x04c6dee8) line 785 + 22 bytes mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333 LIBHTTPD! 6ff110bf() Note that none of the line numbers quite match as I've added comments in my source. Thus the util_ald_cache_fetch() line is: hashval = (*cache-hash)(payload) % cache-size; While the util_ldap_cache_checkuserid() line is: search_nodep = util_ald_cache_fetch(curl-search_cache, the_search_node); I was just about to patch around this by check cache-hash for null and returning null in this case from util_ald_cache_fetch(), but I'm all ears for a better fix. I'm also all ears for a fix to the hang -- perhaps I can cull out a stack dump for that too... Silly me - the hash field being null seems to indicate this structure is seriously munged -- just working around this one condition just moves along to the next crash. -- Jess Holle
Re: Apache 2.0.51 util_ldap
Okay, the cause of this issue is now clear: util_ald_create_caches() does not set 'newcurl' to anything when any of the caches are null, which they all are when they're sized at zero. The fix is also simple: add an 'else newcurl = NULL;' after the 'if' block in this routine. [This really drives home why I have developed in Java for the last 5 years after spending 7+ years doing C and C++. This issue could not have occured in Java -- the compiler would have rejected the issue. I'm not saying the extra speed, etc, due to Apache being written in C is not nice. Nor due I wish to start some holy war. It's just that the lack of pointer / memory allocation issues, uninitialized variables, and not having to produce one's own APR to deal with platforms make Java a much more productive place for me.] -- Jess Holle Jess Holle wrote: Jess Holle wrote: Here you go: () util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84) line 358 + 12 bytes util_ldap_cache_checkuserid(request_rec * 0x6fb51341, util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const char * 0x00880db0, int 9487736, char * * 0x0002, const char * 0x, const char * 0x04c6def4, const char * * 0x00a5eede, const char * * * 0x04c6dee8) line 785 + 22 bytes mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333 LIBHTTPD! 6ff110bf() Note that none of the line numbers quite match as I've added comments in my source. Thus the util_ald_cache_fetch() line is: hashval = (*cache-hash)(payload) % cache-size; While the util_ldap_cache_checkuserid() line is: search_nodep = util_ald_cache_fetch(curl-search_cache, the_search_node); I was just about to patch around this by check cache-hash for null and returning null in this case from util_ald_cache_fetch(), but I'm all ears for a better fix. I'm also all ears for a fix to the hang -- perhaps I can cull out a stack dump for that too... Silly me - the hash field being null seems to indicate this structure is seriously munged -- just working around this one condition just moves along to the next crash. -- Jess Holle
Apache 2.0.51 util_ldap
I'm noticing a number of serious issues with util_ldap in Apache 2.0.51 on Windows: If you use what used to be safe "I don't trust the cache" config parameters as follows, you get an immediate crash (due to a null mutex). LDAPCacheEntries 0 LDAPOpCacheEntries 0 LDAPSharedCacheSize 0 There are now many cases wherein Apache will *hang* when the number of unique users that have authenticated against LDAP exceeds LDAPCacheEntries. In the *best* case, both the worker and parent process will crash. It used to be that only the worker process would crash -- thus allowing the parent to start a new worker and not result in the server being dead in the water. There are some strange bits here: Using 1 for LDAPCacheEntries, LDAPOpCacheEntries, and LDAPSharedCacheSize allows for a seemingly unlimited number of unique user logins! This is inexplicable. It would seem a nice workaround, but I need to support existing configurations "as is", e.g. the 0,0,0 config above. There are *some* cases where Apache can service many unique authenticated users beyond LDAPCacheEntries, but there are very hard to predict. For example, LDAPCacheEntries of 2150, LDAPOpCacheEntries of 1 [this at least used to cause a crash if this was 0 and LDAPCacheEntries was non-zero], LDAPSharedCacheSize of 865000, and setting LDAPSharedCacheFile allows at least 2500 (my current LDAP data set size) unique authenticated users. Yet if I increase LDAPSharedCacheSize, which should seemingly make no difference, Apache will crash *much* earlier. Starting Apache with LDAP cachinng enabled (e.g. with the configuration in the last bullet) now results in a crash on shutdown in apr_rmm_addr_get() [rmm-base is null]. This occurs even if no requests were made since startup. Overall, given the security and non-LDAP fixes in 2.0.51, I am now left pondering whether I should move try backing the LDAP modules back to 2.0.50 while keeping all other 2.0.51 code. Ideas? Also, Windows is only the first platform I've tested. I also have to work out Solaris and AIX. Thus if these work better, I may end up keeping the 2.0.51 LDAP code there... I get the ugly feeling I should have tested all of this earlier in the 2.0.51 cycle, but I was busy at the time. All in all, LDAP does not appear to be a happy camper on 2.0.51 on Windows. -- Jess Holle [EMAIL PROTECTED]
Re: Apache 2.0.51 util_ldap
one possibility is to apply the security patches you need to 2.0.50 see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/ the descriptions of the vulnerabilities at http://httpd.apache.org/ indicate which components are affected; note that CAN-2004-0786 applies to all configurations; I have seen a suggestion that it affects IPv6 setups only, but that is not the case
Re: Apache 2.0.51 util_ldap
Working on a wild hunch, I backed util_ldap source down to right before the global mutex stuff went in -- as that should not be necessary with a single child process anyway, right? This fixed the crash on shutdown -- but that's all. I'm going to try the 2.0.50 util_ldap sources with everything else from 2.0.51 as well. Else I might have to go back to 2.0.50 plus security fixes as you suggest. And that's still not even trying the worker mpm on Solaris -- which at least used to have worse behavior than Windows in this area. -- Jess Holle Jeff Trawick wrote: one possibility is to apply the security patches you need to 2.0.50 see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/ the descriptions of the vulnerabilities at http://httpd.apache.org/ indicate which components are affected; note that CAN-2004-0786 applies to all configurations; I have seen a suggestion that it affects IPv6 setups only, but that is not the case
Re: Apache 2.0.51 util_ldap
One small correction: When I remove the global mutex stuff I no longer have the case where both the worker and parent processes crash, so that's another improvement on Windows. Unfortunately, I still have the case where Apache hangs, however. -- Jess Holle Jess Holle wrote: Working on a wild hunch, I backed util_ldap source down to right before the global mutex stuff went in -- as that should not be necessary with a single child process anyway, right? This fixed the crash on shutdown -- but that's all. I'm going to try the 2.0.50 util_ldap sources with everything else from 2.0.51 as well. Else I might have to go back to 2.0.50 plus security fixes as you suggest. And that's still not even trying the worker mpm on Solaris -- which at least used to have worse behavior than Windows in this area. -- Jess Holle Jeff Trawick wrote: one possibility is to apply the security patches you need to 2.0.50 see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/ the descriptions of the vulnerabilities at http://httpd.apache.org/ indicate which components are affected; note that CAN-2004-0786 applies to all configurations; I have seen a suggestion that it affects IPv6 setups only, but that is not the case
Re: Apache 2.0.51 util_ldap
At 08:54 AM 9/17/2004, Jess Holle wrote: ... given the security and non-LDAP fixes in 2.0.51, I am now left pondering whether I should move try backing the LDAP modules back to 2.0.50 while keeping all other 2.0.51 code. Ideas? All in all, LDAP does not appear to be a happy camper on 2.0.51 on Windows. That's an entirely rational solution, ABI should be strong enough at this point for 2.0.50 ldap to play nicely in your new 2.0.51. Bill
Re: Apache 2.0.51 util_ldap
William A. Rowe, Jr. wrote: At 08:54 AM 9/17/2004, Jess Holle wrote: ... given the security and non-LDAP fixes in 2.0.51, I am now left pondering whether I should move try backing the LDAP modules back to 2.0.50 while keeping all other 2.0.51 code. Ideas? All in all, LDAP does not appear to be a happy camper on 2.0.51 on Windows. That's an entirely rational solution, ABI should be strong enough at this point for 2.0.50 ldap to play nicely in your new 2.0.51. Bill Actually, my plan was to use the 2.0.50 LDAP modules *sources* laid on top the other 2.0.51 sources. I'd hope ABI is strong enough now, but a good clean compile gives me a nice comfy feeling. I'm actually hoping to have time to test a variety of code points between and including 2.0.50 and 2.0.51 to nail down a bit better which changes led to which issues... -- Jess Holle