Re: Apache 2.0.51 util_ldap

2004-09-20 Thread Brad Nicholes
I'm still wondering if we shouldn't just stick with the local
read/write 
lock on Windows and other single child MPMs (NetWare?) as this should

allow better throughput in such cases and yet be safe, right?  In
fact, 

Actually on NetWare this is a non-issue.  On NetWare everything is
global (memory, locks, etc.), so there is no difference between a global
mutex and a local one (other than previously we were using reader/writer
locks rather than mutexes).  I would like to use reader/writer locks
rather than global mutexes simply for performance reasons, but I'm not
sure how we would go about switching between global and local locks
anyway.  This would require #ifdef'ing the code for particular platforms
or MPM's which isn't a good thing.

Brad


Brad Nicholes
Senior Software Engineer
Novell, Inc., the leading provider of Net business solutions
http://www.novell.com 

 [EMAIL PROTECTED] Sunday, September 19, 2004 3:43:20 PM 
Sorry for the chattiness of my solution process.  I've tested and these

fixes do apply with the global mutex changes *except* when one disables

caches by sizing them all to 0, Apache will crash on the first 
authentication request when the global mutexes are used!  This needs to

be fixed!

I've attached a unified diff containing the purge fix and the
unassigned 
variable fix (which as Graham pointed out is already in the 2.1
branch).

I'm still wondering if we shouldn't just stick with the local
read/write 
lock on Windows and other single child MPMs (NetWare?) as this should 
allow better throughput in such cases and yet be safe, right?  In fact,

why do we use shared memory on these platforms for the cache?  [If I'm

just daft here, I apologize.]

--
Jess Holle

Jess Holle wrote:

 Here's a fixed LDAP purge routine which works great in my testing 
 (with cache sizes of 8, 100, 1000, and 2150 and 2500 unique user 
 logins repeated 3 times each).  [No, I haven't produced a diff as I 
 have pieces of util_ldap from various CVS levels at this point.]

 Essentially I added all the logic surrounding 'pp', which is the 
 address of the previous node's 'next' field or of cache-nodes[i] in

 the case of the first node.  [Cleary my C is getting rusty -- this 
 took me a few attempts to get right...]

 This fixes the biggest LDAP module issue I'm aware of: hangs and 
 crashes after one or more cache purges.




Re: Apache 2.0.51 util_ldap

2004-09-20 Thread Jess Holle




Brad Nicholes wrote:

  
I'm still wondering if we shouldn't just stick with the local 
read/write 
  
  
lock on Windows and other single child MPMs (NetWare?) as this should

  
  
allow better throughput in such cases and yet be safe, right?  In
fact, 
  
  Actually on NetWare this is a non-issue.  On NetWare everything is
global (memory, locks, etc.), so there is no difference between a global
mutex and a local one (other than previously we were using reader/writer
locks rather than mutexes).  I would like to use reader/writer locks
rather than global mutexes simply for performance reasons, but I'm not
sure how we would go about switching between global and local locks
anyway.  This would require #ifdef'ing the code for particular platforms
or MPM's which isn't a good thing.
  

On the contrary, if our #ifdef'ing is localized, I believe that doing
this on a per-platform or per-MPM (or better yet a #ifdef
HAS_MULTIPLE_WORKER_PROCESSES or some such) basis to maximize
performance would be worthwhile.

To scatter such #ifdef's throughout the whole module would be a
maintenance nightmare, of course.

--
Jess Holle





Re: Apache 2.0.51 util_ldap

2004-09-20 Thread William A. Rowe, Jr.
At 01:33 PM 9/20/2004, Jess Holle wrote:
Brad Nicholes wrote:

I'm still wondering if we shouldn't just stick with the local 
read/write 
lock on Windows and other single child MPMs (NetWare?) as this should
allow better throughput in such cases and yet be safe, right?  In
fact, 

Actually on NetWare this is a non-issue.  On NetWare everything is
global (memory, locks, etc.), so there is no difference between a global
mutex and a local one (other than previously we were using reader/writer
locks rather than mutexes).

It's similar for Win32 - except single process can be implemented
as critical sections.

On the contrary, if our #ifdef'ing is localized, I believe that doing this on a 
per-platform or per-MPM (or better yet a #ifdef HAS_MULTIPLE_WORKER_PROCESSES or some 
such) basis to maximize performance would be worthwhile.

-1 Veto (not a vote) to test platforms.

However, ap_mpm_query() will let you determine if you are running
on a single or multi-process mpm, a threaded or non-threaded mpm,
etc.  If you want to test mpm behavior and make selections based
on those characteristics, I'd see no issues with that.

To scatter such #ifdef's throughout the whole module would be a maintenance 
nightmare, of course.

Exactly.  Who two years from now will be able to follow the code?
httpd-1.3 was abandoned (effectively replaced) because of main().

At some point, you break down and create two seperate modules for
different conditions, witness mod_cgi v.s. mod_cgid.

Bill




Re: Apache 2.0.51 util_ldap

2004-09-20 Thread Jess Holle
Even after all my patches, I still get a bus error and core dump on the 
first LDAP authentication request on Solaris 8 with worker MPM and an 
active shared memory LDAP cache.  [This is with iPlanet LDAP SDK 5.08, 
though I doubt that matters.]

I've run out of time to look into this further.  Moreover, I am leaning 
towards very few Apache child processes with many threads.  Thus I 
believe that having each process have its own cache with read/write 
locks is a better strategy for this arrangement and will just #if out 
the shared memory and move on.

--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-20 Thread Jess Holle




William A. Rowe, Jr. wrote:

  At 01:33 PM 9/20/2004, Jess Holle wrote:
  
  
Brad Nicholes wrote:


  
I'm still wondering if we shouldn't just stick with the local 
read/write 
lock on Windows and other single child MPMs (NetWare?) as this should
allow better throughput in such cases and yet be safe, right?  In
fact, 

  
  Actually on NetWare this is a non-issue.  On NetWare everything is
global (memory, locks, etc.), so there is no difference between a global
mutex and a local one (other than previously we were using reader/writer
locks rather than mutexes).
  

  
  It's similar for Win32 - except single process can be implemented
as critical sections.
  
  
On the contrary, if our #ifdef'ing is localized, I believe that doing this on a per-platform or per-MPM (or better yet a #ifdef HAS_MULTIPLE_WORKER_PROCESSES or some such) basis to maximize performance would be worthwhile.

  
  -1 Veto (not a vote) to test platforms.

However, ap_mpm_query() will let you determine if you are running
on a single or multi-process mpm, a threaded or non-threaded mpm,
etc.  If you want to test mpm behavior and make selections based
on those characteristics, I'd see no issues with that.
  

Same basic idea, but a (much) better implementation. Sounds great to
me.

--
Jess Holle





Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Graham Leggett
Jess Holle wrote:
Okay, the cause of this issue is now clear:
util_ald_create_caches() does not set 'newcurl' to anything when any
of the caches are null, which they all are when they're sized at zero.
The fix is also simple: add an 'else newcurl = NULL;' after the 'if' 
block in this routine.
Will fix (if nobody else has yet) - thanks for hunting this down.
[This really drives home why I have developed in Java for the last 5 
years after spending 7+ years doing C and C++.  This issue could not 
have occured in Java -- the compiler would have rejected the issue.  I'm 
not saying the extra speed, etc, due to Apache being written in C is not 
nice.  Nor due I wish to start some holy war.  It's just that the lack 
of pointer / memory allocation issues, uninitialized variables, and not 
having to produce one's own APR to deal with platforms make Java a much 
more productive place for me.]
I am the same - if the job is either time constrained, or accuracy 
constrained, then I stick to Java. C is still king for apps where speed 
is an issue, but then that's at the cost of your hair sometimes.

Regards,
Graham
--


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle
Graham Leggett wrote:
Jess Holle wrote:
Okay, the cause of this issue is now clear:
util_ald_create_caches() does not set 'newcurl' to anything when any
of the caches are null, which they all are when they're sized at 
zero.

The fix is also simple: add an 'else newcurl = NULL;' after the 'if' 
block in this routine.
Will fix (if nobody else has yet) - thanks for hunting this down.
Just a note: this was enough to fix the problem without the global 
mutexes present -- I've not tested again with the mutexes present as I 
want to get to the bottom of the cache overflow crashes first if I can.

The fix should still go in anyway as leaving this uninitialized will 
only lead to awful problems downstream.

--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Graham Leggett
Jess Holle wrote:
Just a note: this was enough to fix the problem without the global 
mutexes present -- I've not tested again with the mutexes present as I 
want to get to the bottom of the cache overflow crashes first if I can.

The fix should still go in anyway as leaving this uninitialized will 
only lead to awful problems downstream.
It's already been committed to v2.1, but I see no backport vote yet for 
v2.0, should I add the backport request?

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle
Graham Leggett wrote:
Jess Holle wrote:
Just a note: this was enough to fix the problem without the global 
mutexes present -- I've not tested again with the mutexes present as 
I want to get to the bottom of the cache overflow crashes first if I 
can.

The fix should still go in anyway as leaving this uninitialized will 
only lead to awful problems downstream.
It's already been committed to v2.1, but I see no backport vote yet 
for v2.0, should I add the backport request?
Yes, please.
--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle
I now see what's wrong with the LDAP cache purge -- it does not fix up 
the 'next' pointers and/or cache-node[i] pointers when removing entries 
-- and thus cannot hope to work.

Unfortunately, my fixes for this are still falling short, but I thought 
I'd pass this along.

--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Graham Leggett
Jess Holle wrote:
I now see what's wrong with the LDAP cache purge -- it does not fix up 
the 'next' pointers and/or cache-node[i] pointers when removing entries 
-- and thus cannot hope to work.

Unfortunately, my fixes for this are still falling short, but I thought 
I'd pass this along.
Looking at the previous fix, some of these problems seem to be fixed in 
HEAD. There was a batch of changes to the LDAP stuff that depended on 
v1.0 of APR, and were thus not backported to httpd v2.0.

There may be some value in checking the diffs between HEAD and 2.0 to 
see what changes have been made - I think there are some bugfixes in 
there that need porting.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle
Graham Leggett wrote:
Jess Holle wrote:
I now see what's wrong with the LDAP cache purge -- it does not fix 
up the 'next' pointers and/or cache-node[i] pointers when removing 
entries -- and thus cannot hope to work.

Unfortunately, my fixes for this are still falling short, but I 
thought I'd pass this along.

Looking at the previous fix, some of these problems seem to be fixed 
in HEAD. There was a batch of changes to the LDAP stuff that depended 
on v1.0 of APR, and were thus not backported to httpd v2.0.

There may be some value in checking the diffs between HEAD and 2.0 to 
see what changes have been made - I think there are some bugfixes in 
there that need porting.
Thanks for the tip.
The purge() routine is still not fixed in HEAD, though...
--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle
Another dumb question:
On Windows since there is only one child process, wouldn't it make sense 
to stick with the read/write locks and not move to a global mutex?

In the multi-child mpms, the global mutex is obviously required, of course.
--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Here's a fixed LDAP purge routine which works great in my testing (with
cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins
repeated 3 times each). [No, I haven't produced a diff as I have
pieces of util_ldap from various CVS levels at this point.]

Essentially I added all the logic surrounding 'pp', which is the
address of the previous node's 'next' field or of cache-nodes[i] in
the case of the first node. [Cleary my C is getting rusty -- this took
me a few attempts to get right...]

This fixes the biggest LDAP module issue I'm aware of: hangs and
crashes after one or more cache purges.

--
Jess Holle
void util_ald_cache_purge(util_ald_cache_t *cache)
{
 unsigned long i;
 util_cache_node_t *p, *q, **pp;
 apr_time_t t;
  
 if (!cache)
 return;
 
 cache-last_purge = apr_time_now();
 cache-npurged = 0;
 cache-numpurges++;
  
 for (i=0; i  cache-size; ++i) {
 pp = cache-nodes + i;
 p = *pp;
 while (p != NULL) {
 if (p-add_time  cache-marktime) {
 q = p-next;
 (*cache-free)(cache, p-payload);
 util_ald_free(cache, p);
 cache-numentries--;
 cache-npurged++;
 p = *pp = q;
 }
 else {
 pp = (p-next);
 p = *pp;
 }
 }
 }
  
 t = apr_time_now();
 cache-avg_purgetime = 
 ((t - cache-last_purge) + (cache-avg_purgetime *
(cache-numpurges-1))) / 
 cache-numpurges;
}

Jess Holle wrote:
Graham
Leggett wrote:
  
  
  Jess Holle wrote:


I now see what's wrong with the LDAP cache
purge -- it does not fix up the 'next' pointers and/or
cache-node[i] pointers when removing entries -- and thus cannot
hope to work.
  
  
Unfortunately, my fixes for this are still falling short, but I thought
I'd pass this along.
  



Looking at the previous fix, some of these problems seem to be fixed in
HEAD. There was a batch of changes to the LDAP stuff that depended on
v1.0 of APR, and were thus not backported to httpd v2.0.


There may be some value in checking the diffs between HEAD and 2.0 to
see what changes have been made - I think there are some bugfixes in
there that need porting.

  
  
Thanks for the tip.
  
  
The purge() routine is still not fixed in HEAD, though...
  
  
--
  
Jess Holle
  






Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Sorry for the chattiness of my solution process. I've tested and these
fixes do apply with the global mutex changes *except* when one disables
caches by sizing them all to 0, Apache will crash on the first
authentication request when the global mutexes are used! This needs to
be fixed!

I've attached a unified diff containing the purge fix and the
unassigned variable fix (which as Graham pointed out is already in the
2.1 branch).

I'm still wondering if we shouldn't just stick with the local
read/write lock on Windows and other single child MPMs (NetWare?) as
this should allow better throughput in such cases and yet be safe,
right? In fact, why do we use shared memory on these platforms for the
cache? [If I'm just daft here, I apologize.]

--
Jess Holle

Jess Holle wrote:

  
  
Here's a fixed LDAP purge routine which works great in my testing (with
cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins
repeated 3 times each). [No, I haven't produced a diff as I have
pieces of util_ldap from various CVS levels at this point.]
  
Essentially I added all the logic surrounding 'pp', which is the
address of the previous node's 'next' field or of cache-nodes[i] in
the case of the first node. [Cleary my C is getting rusty -- this took
me a few attempts to get right...]
  
This fixes the biggest LDAP module issue I'm aware of: hangs and
crashes after one or more cache purges.




--- util_ldap_cache_mgr.c-2.0.512004-09-19 16:28:00.0 -0500
+++ util_ldap_cache_mgr.c   2004-09-19 16:27:56.0 -0500
@@ -173,7 +173,7 @@
 void util_ald_cache_purge(util_ald_cache_t *cache)
 {
 unsigned long i;
-util_cache_node_t *p, *q;
+util_cache_node_t *p, *q, **pp;
 apr_time_t t;
 
 if (!cache)
@@ -184,7 +184,8 @@
 cache-numpurges++;
 
 for (i=0; i  cache-size; ++i) {
-p = cache-nodes[i];
+pp = cache-nodes + i;
+p = *pp;
 while (p != NULL) {
 if (p-add_time  cache-marktime) {
 q = p-next;
@@ -192,10 +193,11 @@
 util_ald_free(cache, p);
 cache-numentries--;
 cache-npurged++;
-p = q;
+p = *pp = q;
 }
 else {
-p = p-next;
+pp = (p-next);
+p = *pp;
 }
 }
 }
@@ -252,6 +254,8 @@
 newcurl = util_ald_cache_insert(st-util_ldap_cache, curl);
 
 }
+else
+  newcurl = NULL;
 
 return newcurl;
 }


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Here's one final patch to fix the global mutex crash when the global
mutex is never allocated due to disabled/empty caches.

I would really like some clarity as to whether:

  We should just stick with the single-process read/write lock for
single-worker MPMs. It would really seem so.
  Whether we should really avoid using shared memory for the LDAP
cache for single-worker MPMs. What's it really buy us in this case?

Given the patches from today and answers (and code adjustments as
appropriate) to those 2 questions, I will feel much better about the
LDAP modules as they now seem pretty stable for a wide range of
settings/circumstances -- at least on Windows. I now need to test
these on Solaris and AIX...

--
Jess Holle

Jess Holle wrote:

  
  
Sorry for the chattiness of my solution process. I've tested and these
fixes do apply with the global mutex changes *except* when one disables
caches by sizing them all to 0, Apache will crash on the first
authentication request when the global mutexes are used! This needs to
be fixed!
  
I've attached a unified diff containing the purge fix and the
unassigned variable fix (which as Graham pointed out is already in the
2.1 branch).
  
I'm still wondering if we shouldn't just stick with the local
read/write lock on Windows and other single child MPMs (NetWare?) as
this should allow better throughput in such cases and yet be safe,
right? In fact, why do we use shared memory on these platforms for the
cache? [If I'm just daft here, I apologize.]
  
--
Jess Holle
  
Jess Holle wrote:
  


Here's a fixed LDAP purge routine which works great in my testing (with
cache sizes of 8, 100, 1000, and 2150 and 2500 unique user logins
repeated 3 times each). [No, I haven't produced a diff as I have
pieces of util_ldap from various CVS levels at this point.]

Essentially I added all the logic surrounding 'pp', which is the
address of the previous node's 'next' field or of cache-nodes[i] in
the case of the first node. [Cleary my C is getting rusty -- this took
me a few attempts to get right...]

This fixes the biggest LDAP module issue I'm aware of: hangs and
crashes after one or more cache purges.
  
  
  

--- util_ldap_cache_mgr.c-2.0.51	2004-09-19 16:28:00.0 -0500
+++ util_ldap_cache_mgr.c	2004-09-19 16:27:56.0 -0500
@@ -173,7 +173,7 @@
 void util_ald_cache_purge(util_ald_cache_t *cache)
 {
 unsigned long i;
-util_cache_node_t *p, *q;
+util_cache_node_t *p, *q, **pp;
 apr_time_t t;
 
 if (!cache)
@@ -184,7 +184,8 @@
 cache-numpurges++;
 
 for (i=0; i  cache-size; ++i) {
-p = cache-nodes[i];
+pp = cache-nodes + i;
+p = *pp;
 while (p != NULL) {
 if (p-add_time  cache-marktime) {
 q = p-next;
@@ -192,10 +193,11 @@
 util_ald_free(cache, p);
 cache-numentries--;
 cache-npurged++;
-p = q;
+p = *pp = q;
 }
 else {
-p = p-next;
+pp = (p-next);
+p = *pp;
 }
 }
 }
@@ -252,6 +254,8 @@
 newcurl = util_ald_cache_insert(st-util_ldap_cache, curl);
 
 }
+else
+  newcurl = NULL;
 
 return newcurl;
 }
  




--- util_ldap.c-2.0.51  2004-09-19 17:11:02.0 -0500
+++ util_ldap.c-new 2004-09-19 17:11:06.0 -0500
@@ -89,9 +89,11 @@
 #endif
 
 #define LDAP_CACHE_LOCK() \
-apr_global_mutex_lock(st-util_ldap_cache_lock)
+if (st-util_ldap_cache_lock) \
+  apr_global_mutex_lock(st-util_ldap_cache_lock)
 #define LDAP_CACHE_UNLOCK() \
-apr_global_mutex_unlock(st-util_ldap_cache_lock)
+if (st-util_ldap_cache_lock) \
+  apr_global_mutex_unlock(st-util_ldap_cache_lock)
 
 
 static void util_ldap_strdup (char **str, const char *newstr)


Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Jess Holle wrote:

  
  
Here's one final patch to fix the global mutex crash when the global
mutex is never allocated due to disabled/empty caches.
  
I would really like some clarity as to whether:
  
We should just stick with the single-process read/write lock
for
single-worker MPMs. It would really seem so.
Whether we should really avoid using shared memory for the LDAP
cache for single-worker MPMs. What's it really buy us in this case?
  

Related stupid questions:

  Does setting LDAPSharedCacheSize to 0 disable shared memory but
not the cache? [The docs say so.]
  If so, then wouldn't we want to use per-process read/write locks
in this case and global mutexes only when shared memory was actually
being used?


  
  

--
Jess Holle





Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Jess Holle wrote:

  
  
Jess Holle wrote:
  


Here's one final patch to fix the global mutex crash when the global
mutex is never allocated due to disabled/empty caches.

I would really like some clarity as to whether:

  We should just stick with the single-process read/write lock
for
single-worker MPMs. It would really seem so.
  Whether we should really avoid using shared memory for the
LDAP
cache for single-worker MPMs. What's it really buy us in this case?

  
Related stupid questions:
  
Does setting LDAPSharedCacheSize to 0 disable shared memory but
not the cache? [The docs say so.]
  

P.S. The doc says so, but the ldap-status handler provides no
information in this case as if the cache were not active.

--
Jess Holle





Re: Apache 2.0.51 util_ldap

2004-09-19 Thread Jess Holle




Sorry. One last post...

It seems that at least on Windows if I place
Location /server/cache-info
 SetHandler ldap-status
/Location

too early in the configuration Apache crashes on shutdown.
Specifically, if I place it in front of my mod_deflate configuration
(similar to that in documentation), it will crash on shutdown with:
apr_rmm_free(apr_rmm_t * 0x0091d898, unsigned int 12583060)
line 375
util_ald_free(util_ald_cache * 0x00c00020, const void * 0x00c00094)
line 82 + 19 bytes
util_ald_destroy_cache(util_ald_cache * 0x6eec8526) line 352
run_cleanups(cleanup_t * * 0x0026a960) line 1952
apr_pool_destroy(apr_pool_t * 0x6ff1e82b) line 733
ap_mpm_run(apr_pool_t * 0x002689d8, apr_pool_t * 0x, server_rec
* 0x70a9f1ab) line 1645
main(int 1890185643, const char * const * 0x8002) line 624 + 8 bytes
SHLWAPI! 70a9f1ab()
ff50dc45()

Jess Holle wrote:

  
  
Jess Holle wrote:
  


Jess Holle wrote:

  
  
Here's one final patch to fix the global mutex crash when the global
mutex is never allocated due to disabled/empty caches.
  
I would really like some clarity as to whether:
  
We should just stick with the single-process read/write
lock
for
single-worker MPMs. It would really seem so.
Whether we should really avoid using shared memory for the
LDAP
cache for single-worker MPMs. What's it really buy us in this case?
  

Related stupid questions:

  Does setting LDAPSharedCacheSize to 0 disable shared memory
but
not the cache? [The docs say so.]

  
P.S. The doc says so, but the ldap-status handler provides no
information in this case as if the cache were not active.
  
--
Jess Holle
  






Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle
I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 
and this fixed the hangs and the crash when the cache is disabled by 
zero-sizing everything.  Therefore APR fixes, etc, are not the issue -- 
util_ldap itself is (as mod_auth_ldap did not change).

I now plan to look for source versions between 2.0.50 and 2.0.51 that 
provide improvements without these regressions.

--
Jess Holle
Jess Holle wrote:
One small correction:
When I remove the global mutex stuff I no longer have the case where 
both the worker and parent processes crash, so that's another 
improvement on Windows.  Unfortunately, I still have the case where 
Apache hangs, however.

--
Jess Holle
Jess Holle wrote:
Working on a wild hunch, I backed util_ldap source down to right 
before the global mutex stuff went in -- as that should not be 
necessary with a single child process anyway, right?

This fixed the crash on shutdown -- but that's all.
I'm going to try the 2.0.50 util_ldap sources with everything else 
from 2.0.51 as well.  Else I might have to go back to 2.0.50 plus 
security fixes as you suggest.

And that's still not even trying the worker mpm on Solaris -- which 
at least used to have worse behavior than Windows in this area.

--
Jess Holle
Jeff Trawick wrote:
one possibility is to apply the security patches you need to 2.0.50
see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/
the descriptions of the vulnerabilities at http://httpd.apache.org/
indicate which components are affected; note that CAN-2004-0786
applies to all configurations; I have seen a suggestion that it
affects IPv6 setups only, but that is not the case
 






Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Graham Leggett
Jess Holle wrote:
I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 2.0.50 
and this fixed the hangs and the crash when the cache is disabled by 
zero-sizing everything.  Therefore APR fixes, etc, are not the issue -- 
util_ldap itself is (as mod_auth_ldap did not change).
Is it possible to get a stack trace from the crash?
Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle
Jess Holle wrote:
I reverted to the mod_auth_ldap and util_ldap (aka mod_ldap) from 
2.0.50 and this fixed the hangs and the crash when the cache is 
disabled by zero-sizing everything.  Therefore APR fixes, etc, are not 
the issue -- util_ldap itself is (as mod_auth_ldap did not change).

I now plan to look for source versions between 2.0.50 and 2.0.51 that 
provide improvements without these regressions.

--
Jess Holle
Rolling back to version 1.3.2.11 of util_ldap_cache_mgr.c seems to fix 
the hang (which seems odd to me...).

The crash on startup with 0-sized cache appears to be related to a 
missing null check in the duplicate entry prevention fixes.  I'll look 
into that more.

--
Jess Holle


Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle




Here you go:
()
util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84)
line 358 + 12 bytes
util_ldap_cache_checkuserid(request_rec * 0x6fb51341,
util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const
char * 0x00880db0, int 9487736, char * * 0x0002, const char *
0x, const char * 0x04c6def4, const char * * 0x00a5eede, const
char * * * 0x04c6dee8) line 785 + 22 bytes
mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333
LIBHTTPD! 6ff110bf()

Note that none of the line numbers quite match as I've added comments
in my source. Thus the util_ald_cache_fetch() line is:
hashval = (*cache-hash)(payload) % cache-size;

While the util_ldap_cache_checkuserid() line is:
search_nodep = util_ald_cache_fetch(curl-search_cache,
the_search_node);

I was just about to patch around this by check cache-hash for null
and returning null in this case from util_ald_cache_fetch(), but I'm
all ears for a better fix. I'm also all ears for a fix to the hang --
perhaps I can cull out a stack dump for that too...

--
Jess Holle

Graham Leggett wrote:
Jess Holle
wrote:
  
  
  I reverted to the mod_auth_ldap and util_ldap
(aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when
the cache is disabled by zero-sizing everything. Therefore APR fixes,
etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not
change).

  
  
Is it possible to get a stack trace from the crash?
  
  
Regards,
  
Graham
  
--
  






Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle




Note the stack trace below was generated with 2.0.51's global mutex
changes removed.

A crash still occurs with a zero sized cache with the global mutex
changes in place, but I believe it is from a null mutex, not a null
cache hash function entry.

--
Jess Holle

Jess Holle wrote:

  
  
Here you go:
  ()
util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84)
line 358 + 12 bytes
util_ldap_cache_checkuserid(request_rec * 0x6fb51341,
util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const
char * 0x00880db0, int 9487736, char * * 0x0002, const char *
0x, const char * 0x04c6def4, const char * * 0x00a5eede, const
char * * * 0x04c6dee8) line 785 + 22 bytes
mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333
LIBHTTPD! 6ff110bf()
  
Note that none of the line numbers quite match as I've added comments
in my source. Thus the util_ald_cache_fetch() line is:
  hashval = (*cache-hash)(payload) % cache-size;
  
While the util_ldap_cache_checkuserid() line is:
  search_nodep =
util_ald_cache_fetch(curl-search_cache,
the_search_node);
  
I was just about to patch around this by check cache-hash for null
and returning null in this case from util_ald_cache_fetch(), but I'm
all ears for a better fix. I'm also all ears for a fix to the hang --
perhaps I can cull out a stack dump for that too...
  
--
Jess Holle
  
Graham Leggett wrote:
  Jess
Holle
wrote: 

I reverted to the mod_auth_ldap and
util_ldap
(aka mod_ldap) from 2.0.50 and this fixed the hangs and the crash when
the cache is disabled by zero-sizing everything. Therefore APR fixes,
etc, are not the issue -- util_ldap itself is (as mod_auth_ldap did not
change). 


Is it possible to get a stack trace from the crash? 

Regards, 
Graham 
-- 
  
  






Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle




Jess Holle wrote:

  
  
Here you go:
  ()
util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84)
line 358 + 12 bytes
util_ldap_cache_checkuserid(request_rec * 0x6fb51341,
util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const
char * 0x00880db0, int 9487736, char * * 0x0002, const char *
0x, const char * 0x04c6def4, const char * * 0x00a5eede, const
char * * * 0x04c6dee8) line 785 + 22 bytes
mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333
LIBHTTPD! 6ff110bf()
  
Note that none of the line numbers quite match as I've added comments
in my source. Thus the util_ald_cache_fetch() line is:
  hashval = (*cache-hash)(payload) % cache-size;
  
While the util_ldap_cache_checkuserid() line is:
  search_nodep =
util_ald_cache_fetch(curl-search_cache,
the_search_node);
  
I was just about to patch around this by check cache-hash for null
and returning null in this case from util_ald_cache_fetch(), but I'm
all ears for a better fix. I'm also all ears for a fix to the hang --
perhaps I can cull out a stack dump for that too...
Silly me - the hash field being null seems to indicate this structure
is seriously munged -- just working around this one condition just
moves along to the next crash.

--
Jess Holle





Re: Apache 2.0.51 util_ldap

2004-09-18 Thread Jess Holle




Okay, the cause of this issue is now clear:
util_ald_create_caches() does not set 'newcurl' to anything
when any of the caches are null, which they all are when they're sized
at zero.

The fix is also simple: add an 'else newcurl = NULL;' after the 'if'
block in this routine.

[This really drives home why I have developed in Java for the last 5
years after spending 7+ years doing C and C++. This issue could not
have occured in Java -- the compiler would have rejected the issue.
I'm not saying the extra speed, etc, due to Apache being written in C
is not nice. Nor due I wish to start some holy war. It's just that
the lack of pointer / memory allocation issues, uninitialized
variables, and not having to produce one's own APR to deal with
platforms make Java a much more productive place for me.]

--
Jess Holle

Jess Holle wrote:

  
  
Jess Holle wrote:
  


Here you go:
()
util_ald_cache_fetch(util_ald_cache * 0x00a02cb8, void * 0x04c6de84)
line 358 + 12 bytes
util_ldap_cache_checkuserid(request_rec * 0x6fb51341,
util_ldap_connection_t * 0x00a5cdb0, const char * 0x00a02cf0, const
char * 0x00880db0, int 9487736, char * * 0x0002, const char *
0x, const char * 0x04c6def4, const char * * 0x00a5eede, const
char * * * 0x04c6dee8) line 785 + 22 bytes
mod_auth_ldap_check_user_id(request_rec * 0x6ff110bf) line 333
LIBHTTPD! 6ff110bf()

Note that none of the line numbers quite match as I've added comments
in my source. Thus the util_ald_cache_fetch() line is:
hashval = (*cache-hash)(payload) % cache-size;

While the util_ldap_cache_checkuserid() line is:
search_nodep =
util_ald_cache_fetch(curl-search_cache,
the_search_node);

I was just about to patch around this by check cache-hash for null
and returning null in this case from util_ald_cache_fetch(), but I'm
all ears for a better fix. I'm also all ears for a fix to the hang --
perhaps I can cull out a stack dump for that too...
Silly me - the hash field being null seems to indicate this structure
is seriously munged -- just working around this one condition just
moves along to the next crash.
  
--
Jess Holle





Apache 2.0.51 util_ldap

2004-09-17 Thread Jess Holle




I'm noticing a number of serious issues with util_ldap in Apache 2.0.51
on Windows:

  If you use what used to be safe "I don't trust the cache" config
parameters as follows, you get an immediate crash (due to a null mutex).


  LDAPCacheEntries 0
LDAPOpCacheEntries 0
LDAPSharedCacheSize 0
  


  There are now many cases wherein Apache will *hang* when the
number of unique users that have authenticated against LDAP exceeds
LDAPCacheEntries. In the *best* case, both the worker and parent
process will crash. It used to be that only the worker process would
crash -- thus allowing the parent to start a new worker and not result
in the server being dead in the water.
  There are some strange bits here:
  
Using 1 for LDAPCacheEntries, LDAPOpCacheEntries, and
LDAPSharedCacheSize allows
for a seemingly unlimited number of unique user logins! This is
inexplicable. It would seem a nice workaround, but I need to support
existing configurations "as is", e.g. the 0,0,0 config above.
There are *some* cases where Apache can service many unique
authenticated users beyond LDAPCacheEntries, but there are very hard to
predict. For example, LDAPCacheEntries of 2150, LDAPOpCacheEntries of
1 [this at least used to cause a crash if this was 0 and
LDAPCacheEntries was non-zero], LDAPSharedCacheSize of 865000, and
setting LDAPSharedCacheFile allows at least 2500 (my current LDAP data
set size) unique authenticated users. Yet if I increase
LDAPSharedCacheSize, which should seemingly make no difference, Apache
will crash *much* earlier.
  
  Starting Apache with LDAP cachinng enabled (e.g. with the
configuration in the last bullet) now results in a crash on shutdown in
apr_rmm_addr_get() [rmm-base is null]. This occurs even if no
requests were made since startup.
  

Overall, given the security and non-LDAP fixes in 2.0.51, I am now left
pondering whether I should move try backing the LDAP modules back to
2.0.50 while keeping all other 2.0.51 code. Ideas? Also, Windows is
only the first platform I've tested. I also have to work out Solaris
and AIX. Thus if these work better, I may end up keeping the 2.0.51
LDAP code there...

I get the ugly feeling I should have tested all of this earlier in the
2.0.51 cycle, but I was busy at the time.

All in all, LDAP does not appear to be a happy camper on 2.0.51 on
Windows.

--
Jess Holle
[EMAIL PROTECTED]





Re: Apache 2.0.51 util_ldap

2004-09-17 Thread Jeff Trawick
one possibility is to apply the security patches you need to 2.0.50

see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/

the descriptions of the vulnerabilities at http://httpd.apache.org/
indicate which components are affected; note that CAN-2004-0786
applies to all configurations; I have seen a suggestion that it
affects IPv6 setups only, but that is not the case


Re: Apache 2.0.51 util_ldap

2004-09-17 Thread Jess Holle
Working on a wild hunch, I backed util_ldap source down to right before 
the global mutex stuff went in -- as that should not be necessary with a 
single child process anyway, right?

This fixed the crash on shutdown -- but that's all.
I'm going to try the 2.0.50 util_ldap sources with everything else from 
2.0.51 as well.  Else I might have to go back to 2.0.50 plus security 
fixes as you suggest.

And that's still not even trying the worker mpm on Solaris -- which at 
least used to have worse behavior than Windows in this area.

--
Jess Holle
Jeff Trawick wrote:
one possibility is to apply the security patches you need to 2.0.50
see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/
the descriptions of the vulnerabilities at http://httpd.apache.org/
indicate which components are affected; note that CAN-2004-0786
applies to all configurations; I have seen a suggestion that it
affects IPv6 setups only, but that is not the case
 



Re: Apache 2.0.51 util_ldap

2004-09-17 Thread Jess Holle
One small correction:
When I remove the global mutex stuff I no longer have the case where 
both the worker and parent processes crash, so that's another 
improvement on Windows.  Unfortunately, I still have the case where 
Apache hangs, however.

--
Jess Holle
Jess Holle wrote:
Working on a wild hunch, I backed util_ldap source down to right 
before the global mutex stuff went in -- as that should not be 
necessary with a single child process anyway, right?

This fixed the crash on shutdown -- but that's all.
I'm going to try the 2.0.50 util_ldap sources with everything else 
from 2.0.51 as well.  Else I might have to go back to 2.0.50 plus 
security fixes as you suggest.

And that's still not even trying the worker mpm on Solaris -- which at 
least used to have worse behavior than Windows in this area.

--
Jess Holle
Jeff Trawick wrote:
one possibility is to apply the security patches you need to 2.0.50
see http://apache.towardex.com/httpd/patches/apply_to_2.0.50/
the descriptions of the vulnerabilities at http://httpd.apache.org/
indicate which components are affected; note that CAN-2004-0786
applies to all configurations; I have seen a suggestion that it
affects IPv6 setups only, but that is not the case
 





Re: Apache 2.0.51 util_ldap

2004-09-17 Thread William A. Rowe, Jr.
At 08:54 AM 9/17/2004, Jess Holle wrote:
... given the security and non-LDAP fixes in 2.0.51, I am now left pondering whether 
I should move try backing the LDAP modules back to 2.0.50 while keeping all other 
2.0.51 code.  Ideas?

All in all, LDAP does not appear to be a happy camper on 2.0.51 on Windows.

That's an entirely rational solution, ABI should be strong enough
at this point for 2.0.50 ldap to play nicely in your new 2.0.51.

Bill  



Re: Apache 2.0.51 util_ldap

2004-09-17 Thread Jess Holle




William A. Rowe, Jr. wrote:

  At 08:54 AM 9/17/2004, Jess Holle wrote:
  
  
... given the security and non-LDAP fixes in 2.0.51, I am now left pondering whether I should move try backing the LDAP modules back to 2.0.50 while keeping all other 2.0.51 code.  Ideas?

  
  
All in all, LDAP does not appear to be a happy camper on 2.0.51 on Windows.

  
  That's an entirely rational solution, ABI should be strong enough
at this point for 2.0.50 ldap to play nicely in your new 2.0.51.

Bill  
  

Actually, my plan was to use the 2.0.50 LDAP modules *sources* laid on
top the other 2.0.51 sources. I'd hope ABI is strong enough now, but a
good clean compile gives me a nice comfy feeling.

I'm actually hoping to have time to test a variety of code points
between and including 2.0.50 and 2.0.51 to nail down a bit better which
changes led to which issues...

--
Jess Holle