We've been having an intermittent crash using the Sun LDAP C SDK for some time, and more recently switched to the Mozilla 5.1.7 C SDK in an unsuccessful attempt to alleviate it.
After some digging, it appears the problem is with the simple_bindifnot_s call, which is used whenever LDAP_OPT_RECONNECT is used (this is in the 5.1.7 codebase, the code has been rearranged slightly in 6.0.x but the same problem exists). In that call, we check if the connection is alive, if not, we call: nsldapi_free_connection( ld, ld->ld_defconn, NULL, NULL, 1, 0 ); This call basically frees the LDAPConn structure. Usually, the structure is only freed when its refcount drops to zero, since it may be in use (pointed to) by places other than the man ld_conns linked list (such as requests that are in flight). The issue with this particular version of the call, however, is the second last parameter '1' - this indicates "force kill" the connection. This makes the free_connection code ignore the refcount and kill the connection regardless. I believe this causes the crash, because the connection may still be referenced by requests in progress, and they have no indication that the associate memory is free. In particular, I see the crash inside nsldapi_connection_lost_nolock at this line: ( lr->lr_conn != NULL && lr->lr_conn->lconn_sb == sb ) Here, we deference the connection associated with the request - however, the memory associated with the connection may already have been freed, as per above. Usually, this doesn't even cause a crash, since that memory page is typically still assigned to the process, so the comparison will just fail and we continue on our merry little way. However, should the OS decide the reclaim the associated page, you may be a segfault/access violation, and I believe that's what occurs in our case - the pointer we are crashing on is close to but just outside the list of segments currently allocated to the process. I'd like to submit a patch for this issue, but I'd like advice on the correct way to fix it. Suggestion 1 is to simple remove the "force" option on the connection_free call - why are we using it this context? Is there some reason to ignore the refcount here? This fix is easy and seems "correct" to me. Suggestion 2, if there is some reason to keep this force behavior, is to add code in the force case that locks the request list and iterates though it looking for references to the connection to be freed and removing them (NULLing them out) - thus we won't try to access them in the connection lost code, above. Any guidance? _______________________________________________ dev-tech-ldap mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-ldap
