RE: util_ldap [Bug 29217] - Remove references to calloc()and free()
Brad Nicholes wrote: In fact, I don't think that these are shared locks at all #define LDAP_CACHE_LOCK_CREATE(p) \ if (!st-util_ldap_cache_lock) apr_thread_rwlock_create(st-util_ldap_cache_lock, st-pool) which means that in the shared memory cache, it is highly likely that multiple processes could be altering the cache at the same time. True? Since NetWare is multi-threaded only, we never see this problem. This is true. This creates a process-wide mutex, which can only synchronize threads within the same process so multiple processes can alternate the ldap cache at the same time. This was causing segmentation fault on HP-UX. We have fixed this by creating a global mutex lock in util_ldap_cache_init(); if (!st-util_ldap_cache_lock) { lock_file = apr_psprintf(pool,%s.lock,st-cache_file); result = apr_global_mutex_create(st-util_ldap_cache_lock,lock_file,APR_LOCK_PROC_PT HREAD,pool); if (result != APR_SUCCESS) { return result; } We used mutex instead of rwlock because there is no apr* function to create a global rwlock. This change fixed the segmentation fault problem. Chou
RE: util_ldap [Bug 29217] - Remove references to calloc() and free()
Greg wrote: then the compiler may optimize this as: wrlock test if exists again allocate element insert element race is here, prior to filling in element, another thread may read element, and use element that hasn't been filled in yet. fill in element unlock This can not happen because no reader can hold the rdlock while a writer is holding the wrlock no matter how the compiler optimize it.
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
Brad Nicholes wrote: It appears to me that if it doesn't handle low memory situations or it is giving false positives, those are separate issues from pools vs. calloc/free. I still think we need to implement some better monitoring or logging code in the cache_mgr and enhance the cache-status pages so that we can track things like false positives. Maybe tracking the entries by user name and authentication state rather than just the number entries and how often the cache was hit. This is a definite plan. Maybe the real problem is with the locking. In fact just taking a quick scan through the code again, I am seeing something that bothers me in util_ldap_cache_comparedn() if (curl) { /* compare successful - add to the compare cache */ LDAP_CACHE_RDLOCK(); newnode.reqdn = (char *)reqdn; newnode.dn = (char *)dn; util_ald_cache_insert(curl-dn_compare_cache, newnode); LDAP_CACHE_UNLOCK(); } It appears to be acquiring a read lock but then inserts a new node into the cache. Shouldn't it be acquiring a write lock before doing an insert? I would say it does. Another thing I noticed was that it didn't seem to handle the case when it was half way through adding something to the cache, and the attempt to do so failed. I plan to look at the code again in the next day or two to see if I can better figure out what it is doing. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
The proper logic to add to a cache is wrlock test if exists again add element unlock because there is a race condition in the logic below rdlock test if the element exists race is here, prior to wrlocking, another thread may wrlock-insert promote to wrlock insert unlock At 06:07 PM 6/10/2004, Brad Nicholes wrote: It appears to me that if it doesn't handle low memory situations or it is giving false positives, those are separate issues from pools vs. calloc/free. I still think we need to implement some better monitoring or logging code in the cache_mgr and enhance the cache-status pages so that we can track things like false positives. Maybe tracking the entries by user name and authentication state rather than just the number entries and how often the cache was hit. Maybe the real problem is with the locking. In fact just taking a quick scan through the code again, I am seeing something that bothers me in util_ldap_cache_comparedn() if (curl) { /* compare successful - add to the compare cache */ LDAP_CACHE_RDLOCK(); newnode.reqdn = (char *)reqdn; newnode.dn = (char *)dn; util_ald_cache_insert(curl-dn_compare_cache, newnode); LDAP_CACHE_UNLOCK(); } It appears to be acquiring a read lock but then inserts a new node into the cache. Shouldn't it be acquiring a write lock before doing an insert? Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com Graham Leggett [EMAIL PROTECTED] Thursday, June 10, 2004 4:24:54 PM Brad Nicholes wrote: Do we even know that there is a problem with this code? I haven't seen any memory leaks so far. I would hate to go to all of the work to redesign and rewrite the ldap_cache manager for little to no gain. It does not seem to handle the we ran out of memory while trying to add to the cache case very gracefully. It doesn't crash any more, but I'm experiencing false negatives still, which is the problem I was trying to fix when I started trying to fix the code. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
At 12:33 PM 6/11/2004, William A. Rowe, Jr. wrote: The proper logic to add to a cache is wrlock test if exists again add element unlock because there is a race condition in the logic below rdlock test if the element exists race is here, prior to wrlocking, another thread may wrlock-insert promote to wrlock insert unlock Wow, how about that. I just (as in within the last hour) read an article in the July issue of DDJ[1] that talked about replacing Singleton (WRowe's second example) with with Double-Checked Locking (WRowe's first example) and the problems that still exist because of compiler optimization. What it boils down to is if this: wrlock test if exists again add element unlock can be broken down as this: wrlock test if exists again allocate element fill in element insert element unlock then the compiler may optimize this as: wrlock test if exists again allocate element insert element race is here, prior to filling in element, another thread may read element, and use element that hasn't been filled in yet. fill in element unlock Does the add element step guarantee that it won't insert the element into the cache until it is fully constructed, no matter what the optimizer does to the code? Of course, this is only Part I of the article, Part II isn't until the August issue. [1] Dr. Dobb's Journal, www.ddj.com (http://www.ddj.com/articles/2004/0407/)
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
After speaking many times about this problem on #apache-modules, Paul Querna said that it would be better to port mod_auth_ldap to mod_authn_ldap, and do module caching auth, for all authentication method. I am sure this way is really better and i agree with him. I know he started this cache module, maybe it could be good to continue this effort and on another hand, port mod_auth_ldap to 2.1. I agree that a generic caching module would be of benefit here even before we port mod_auth_ldap to the 2.1 auth structure. Either way, I also think there is a lot of work to do on the caching code. I am sure that we can take advantage of what has been done in mod_ssl and other places that have to mutex protect shared memory. It is actually working great on NetWare at the moment but then we don't use shared memory and we are multi-threaded only. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com [EMAIL PROTECTED] Friday, June 11, 2004 4:04:22 PM mod_ssl mutex are totally different. global mutex are used. Actually, the mutex is in the module_conf, so i think when apache fork childs, this mutex is no more valid, and each child will have a value for it. There is also in util_ldap.c, apr_thread_mutex_create(st-mutex, APR_THREAD_MUTEX_DEFAULT, st-pool); This will alloc data for the mutex with apr_pcalloc, and i think it will be only in the current child. I think there is a lot of work to make it work with global mutex. After speaking many times about this problem on #apache-modules, Paul Querna said that it would be better to port mod_auth_ldap to mod_authn_ldap, and do module caching auth, for all authentication method. I am sure this way is really better and i agree with him. I know he started this cache module, maybe it could be good to continue this effort and on another hand, port mod_auth_ldap to 2.1. Regards, Matthieu
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
At 04:41 PM 6/11/2004, Brad Nicholes wrote: I am sure that we can take advantage of what has been done in mod_ssl and other places that have to mutex protect shared memory. It is actually working great on NetWare at the moment but then we don't use shared memory and we are multi-threaded only. No matter. That's why I created the apr_anylock type, since some platforms would benefit from different thread/proc/global lock strategies. Bill
FW: util_ldap [Bug 29217] - Remove references to calloc()and free()
I'm forwarding on behalf of my collegue (as his mails are not reaching the list) -Madhu -Original Message- From: CHOU,TAIR-SHIAN (HP-Cupertino,ex1) [mailto:[EMAIL PROTECTED] Sent: Friday, June 11, 2004 7:04 PM To: Mathihalli, Madhusudan Subject: FW: util_ldap [Bug 29217] - Remove references to calloc()and free() -Original Message- From: Tair-Shian Chou [mailto:[EMAIL PROTECTED] Sent: Friday, June 11, 2004 6:38 PM To: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: RE: util_ldap [Bug 29217] - Remove references to calloc()and free() Brad Nicholes wrote: In fact, I don't think that these are shared locks at all #define LDAP_CACHE_LOCK_CREATE(p) \ if (!st-util_ldap_cache_lock) apr_thread_rwlock_create(st-util_ldap_cache_lock, st-pool) which means that in the shared memory cache, it is highly likely that multiple processes could be altering the cache at the same time. True? Since NetWare is multi-threaded only, we never see this problem. This is true. This creates a process-wide mutex, which can only synchronize threads within the same process so multiple processes can alternate the ldap cache at the same time. This was causing segmentation fault on HP-UX. We have fixed this by creating a global mutex lock in util_ldap_cache_init(); if (!st-util_ldap_cache_lock) { lock_file = apr_psprintf(pool,%s.lock,st-cache_file); result = apr_global_mutex_create(st-util_ldap_cache_lock,lock_file,APR _LOCK_PROC_PTHREAD,pool); if (result != APR_SUCCESS) { return result; } We used mutex instead of rwlock because there is no apr* function to create a global rwlock. This change fixed the segmentation fault and other synchronization problems. Chou
FW: util_ldap [Bug 29217] - Remove references to calloc() and free()
Another mail.. -Madhu -Original Message- From: CHOU,TAIR-SHIAN (HP-Cupertino,ex1) [mailto:[EMAIL PROTECTED] Sent: Friday, June 11, 2004 7:05 PM To: Mathihalli, Madhusudan Subject: FW: util_ldap [Bug 29217] - Remove references to calloc() and free() -Original Message- From: Tair-Shian Chou [mailto:[EMAIL PROTECTED] Sent: Friday, June 11, 2004 7:03 PM To: '[EMAIL PROTECTED]' Subject: RE: util_ldap [Bug 29217] - Remove references to calloc() and free() Greg wrote: then the compiler may optimize this as: wrlock test if exists again allocate element insert element race is here, prior to filling in element, another thread may read element, and use element that hasn't been filled in yet. fill in element unlock This can not happen because no reader can hold the rdlock while a writer is holding the wrlock no matter how the compiler optimize it.
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
Brad Nicholes wrote: I guess that is a possibility but I still don't understand what the problem is with using calloc() and free() for the ldap caching code. This seems to be a common thing to do when global memory needs to be allocated and deallocated constantly. To avoid having the memory grow uncontrolably, you have to be able to control it at a much finer level than apr_pool allows you. What I've found in the LDAP code is that it isn't very defensive, most of the code simply assumes the rest of the code worked - it has resulted in me finding all sorts of side problems in the code, but not the real problem - the false negatives the code reports after it has been idle for a long time. Changing malloc and free into something more bulletproof (or at least more robust), like pools and/or the reslist stuff would make the code more resilient, and for me, easier to debug :) Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
I agree that the LDAP code needs to do more error checking and between the work that you did, along with the holes that I plugged previously and the shared memory fixes that Mathieu Estrade did, I think the code is much more robust than it has ever been. Many of our NetWare customers use auth_ldap as the primary means of authentication because it is the easiest way to authenticate against eDirectory. Outside of the shared memory problems, since NetWare doesn't use it, we have run into the same issues that you have seen. But since these latest patches that have gone into 2.0.50-dev, those problems have gone away. At least on NetWare, switching to pools would make the code much more complex. Rather than simply calling calloc and free in the same way that we are calling apr_rmm_calloc() and apr_rmm_free(), we would have to implement essentially the same model using pools and reslists. It seems to me like using a sledge hammer to drive a finishing nail in this instance. Also in the end, it all boils down to malloc and free anyway. As far as debugging goes, I can understand why it might be easier using the pool debug code, but we have never been successful in making the pool debug code work on NetWare. Granted we probably haven't tried real hard mainly because NetWare already has some good memory debugging capabilities built into the OS. If debugging is a problem, I think it might be easier to implement some memory debugging code specifically for the LDAP_cache rather than trying to retrofit it with pools and reslists. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com [EMAIL PROTECTED] Thursday, June 10, 2004 4:59:30 AM Brad Nicholes wrote: I guess that is a possibility but I still don't understand what the problem is with using calloc() and free() for the ldap caching code. This seems to be a common thing to do when global memory needs to be allocated and deallocated constantly. To avoid having the memory grow uncontrolably, you have to be able to control it at a much finer level than apr_pool allows you. What I've found in the LDAP code is that it isn't very defensive, most of the code simply assumes the rest of the code worked - it has resulted in me finding all sorts of side problems in the code, but not the real problem - the false negatives the code reports after it has been idle for a long time. Changing malloc and free into something more bulletproof (or at least more robust), like pools and/or the reslist stuff would make the code more resilient, and for me, easier to debug :) Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
Brad Nicholes wrote: At least on NetWare, switching to pools would make the code much more complex. Rather than simply calling calloc and free in the same way that we are calling apr_rmm_calloc() and apr_rmm_free(), we would have to implement essentially the same model using pools and reslists. It seems to me like using a sledge hammer to drive a finishing nail in this instance. Also in the end, it all boils down to malloc and free anyway. As far as debugging goes, I can understand why it might be easier using the pool debug code, but we have never been successful in making the pool debug code work on NetWare. Granted we probably haven't tried real hard mainly because NetWare already has some good memory debugging capabilities built into the OS. If debugging is a problem, I think it might be easier to implement some memory debugging code specifically for the LDAP_cache rather than trying to retrofit it with pools and reslists. The theory is that the pools code is already hopefully been pre-debugged, you can allocate memory from a pool, and be reasonably sure that the problems of freeing the memory is handled for you. (If this is not the case, it won't be an LDAP bug, but an Apache wide bug). The only real issue really is worrying about the scope of the pool. Thinking further about this, we could use one pool per cache entry. To delete that cache entry just means to destroy the pool. No more need to walk the cache entry and delete each buffer one by one, and no room to make mistakes. No more chance that somebody adds a field to the cache entry, and then forgets the code to free the cache entry. Creation of the pool would only be done on creation of the cache entry, which in turn is done only on the first time this user is authenticated, all further requests being cached, so it doesn't seem to be expensive either, unless someone with a better understanding of the internals of pools would be able to say whether this idea is good or bad. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
I have considered using a pool per entry in other caching code that I have written just so that I could have much finer control over allocating and freeing the memory. But in the end what it really comes down to is malloc and free and if that is what you really want, then why not just use malloc and free. Pools just add a layer of memory management between your application and the actual allocation that may be of no use. You have simply replaced malloc() with apr_pool_create() and free() with apr_pool_destroy(). If you forget to call apr_pool_destroy(), you have a memory leak in the same way as you would if you forget to call free(). When the application shuts down, the pool management will clean up the memory but then so will the process management of the OS. Outside of some debugging help during development, converting to pools and reslists is just adding a lot of unneeded overhead. The advantage to using memory pools is in situations where you need a pool of memory that can created, pieced out during a specific operation and then completely cleared once the operation is complete and reused without having to recreate it. Global caches just don't lend themselves to this model. The pieces can come can go, but you can never really clear the whole thing out and reuse it because the operation never actually ends. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com Graham Leggett [EMAIL PROTECTED] Thursday, June 10, 2004 9:46:16 AM Brad Nicholes wrote: At least on NetWare, switching to pools would make the code much more complex. Rather than simply calling calloc and free in the same way that we are calling apr_rmm_calloc() and apr_rmm_free(), we would have to implement essentially the same model using pools and reslists. It seems to me like using a sledge hammer to drive a finishing nail in this instance. Also in the end, it all boils down to malloc and free anyway. As far as debugging goes, I can understand why it might be easier using the pool debug code, but we have never been successful in making the pool debug code work on NetWare. Granted we probably haven't tried real hard mainly because NetWare already has some good memory debugging capabilities built into the OS. If debugging is a problem, I think it might be easier to implement some memory debugging code specifically for the LDAP_cache rather than trying to retrofit it with pools and reslists. The theory is that the pools code is already hopefully been pre-debugged, you can allocate memory from a pool, and be reasonably sure that the problems of freeing the memory is handled for you. (If this is not the case, it won't be an LDAP bug, but an Apache wide bug). The only real issue really is worrying about the scope of the pool. Thinking further about this, we could use one pool per cache entry. To delete that cache entry just means to destroy the pool. No more need to walk the cache entry and delete each buffer one by one, and no room to make mistakes. No more chance that somebody adds a field to the cache entry, and then forgets the code to free the cache entry. Creation of the pool would only be done on creation of the cache entry, which in turn is done only on the first time this user is authenticated, all further requests being cached, so it doesn't seem to be expensive either, unless someone with a better understanding of the internals of pools would be able to say whether this idea is good or bad. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
Brad Nicholes wrote: Do we even know that there is a problem with this code? I haven't seen any memory leaks so far. I would hate to go to all of the work to redesign and rewrite the ldap_cache manager for little to no gain. It does not seem to handle the we ran out of memory while trying to add to the cache case very gracefully. It doesn't crash any more, but I'm experiencing false negatives still, which is the problem I was trying to fix when I started trying to fix the code. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
It appears to me that if it doesn't handle low memory situations or it is giving false positives, those are separate issues from pools vs. calloc/free. I still think we need to implement some better monitoring or logging code in the cache_mgr and enhance the cache-status pages so that we can track things like false positives. Maybe tracking the entries by user name and authentication state rather than just the number entries and how often the cache was hit. Maybe the real problem is with the locking. In fact just taking a quick scan through the code again, I am seeing something that bothers me in util_ldap_cache_comparedn() if (curl) { /* compare successful - add to the compare cache */ LDAP_CACHE_RDLOCK(); newnode.reqdn = (char *)reqdn; newnode.dn = (char *)dn; util_ald_cache_insert(curl-dn_compare_cache, newnode); LDAP_CACHE_UNLOCK(); } It appears to be acquiring a read lock but then inserts a new node into the cache. Shouldn't it be acquiring a write lock before doing an insert? Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com Graham Leggett [EMAIL PROTECTED] Thursday, June 10, 2004 4:24:54 PM Brad Nicholes wrote: Do we even know that there is a problem with this code? I haven't seen any memory leaks so far. I would hate to go to all of the work to redesign and rewrite the ldap_cache manager for little to no gain. It does not seem to handle the we ran out of memory while trying to add to the cache case very gracefully. It doesn't crash any more, but I'm experiencing false negatives still, which is the problem I was trying to fix when I started trying to fix the code. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
In fact, I don't think that these are shared locks at all #define LDAP_CACHE_LOCK_CREATE(p) \ if (!st-util_ldap_cache_lock) apr_thread_rwlock_create(st-util_ldap_cache_lock, st-pool) which means that in the shared memory cache, it is highly likely that multiple processes could be altering the cache at the same time. True? Since NetWare is multi-threaded only, we never see this problem. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com [EMAIL PROTECTED] Thursday, June 10, 2004 5:07:52 PM It appears to me that if it doesn't handle low memory situations or it is giving false positives, those are separate issues from pools vs. calloc/free. I still think we need to implement some better monitoring or logging code in the cache_mgr and enhance the cache-status pages so that we can track things like false positives. Maybe tracking the entries by user name and authentication state rather than just the number entries and how often the cache was hit. Maybe the real problem is with the locking. In fact just taking a quick scan through the code again, I am seeing something that bothers me in util_ldap_cache_comparedn() if (curl) { /* compare successful - add to the compare cache */ LDAP_CACHE_RDLOCK(); newnode.reqdn = (char *)reqdn; newnode.dn = (char *)dn; util_ald_cache_insert(curl-dn_compare_cache, newnode); LDAP_CACHE_UNLOCK(); } It appears to be acquiring a read lock but then inserts a new node into the cache. Shouldn't it be acquiring a write lock before doing an insert? Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com Graham Leggett [EMAIL PROTECTED] Thursday, June 10, 2004 4:24:54 PM Brad Nicholes wrote: Do we even know that there is a problem with this code? I haven't seen any memory leaks so far. I would hate to go to all of the work to redesign and rewrite the ldap_cache manager for little to no gain. It does not seem to handle the we ran out of memory while trying to add to the cache case very gracefully. It doesn't crash any more, but I'm experiencing false negatives still, which is the problem I was trying to fix when I started trying to fix the code. Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
Brad Nicholes wrote: But if you are allocating memory for cache entries that are constantly expiring and being purged, the pool will continue to grow until the server is restarted. The pool would end up with stale memory that the system has no way of reclaiming outside of restarting the server. NetWare doesn't have the concept of a child config pool since there are no child processes and therefore no need to use shared memory. Simply restarting a child process is not an option. On NetWare it is all or nothing. Apache is either up and running or not. If you tried to shutdown the process to reclaim memory, you lose the web server. Hmmm... Could apr_reslist_* help here? Perhaps if the memory was allocated from a pool which was cleaned up periodically using apr_reslist_* (where cleaned up could mean duplicate all fresh cache entries in the pool to a new pool, and trash the old pool). Regards, Graham --
Re: util_ldap [Bug 29217] - Remove references to calloc() and free()
I guess that is a possibility but I still don't understand what the problem is with using calloc() and free() for the ldap caching code. This seems to be a common thing to do when global memory needs to be allocated and deallocated constantly. To avoid having the memory grow uncontrolably, you have to be able to control it at a much finer level than apr_pool allows you. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com Graham Leggett [EMAIL PROTECTED] Wednesday, June 09, 2004 5:51:47 PM Brad Nicholes wrote: But if you are allocating memory for cache entries that are constantly expiring and being purged, the pool will continue to grow until the server is restarted. The pool would end up with stale memory that the system has no way of reclaiming outside of restarting the server. NetWare doesn't have the concept of a child config pool since there are no child processes and therefore no need to use shared memory. Simply restarting a child process is not an option. On NetWare it is all or nothing. Apache is either up and running or not. If you tried to shutdown the process to reclaim memory, you lose the web server. Hmmm... Could apr_reslist_* help here? Perhaps if the memory was allocated from a pool which was cleaned up periodically using apr_reslist_* (where cleaned up could mean duplicate all fresh cache entries in the pool to a new pool, and trash the old pool). Regards, Graham --