We've been having an intermittent crash using the Sun LDAP C SDK for
some time, and more recently switched to the Mozilla 5.1.7 C SDK in an
unsuccessful attempt to alleviate it.

After some digging, it appears the problem is with the
simple_bindifnot_s call, which is used whenever LDAP_OPT_RECONNECT is
used (this is in the 5.1.7 codebase, the code has been rearranged
slightly in 6.0.x but the same problem exists).  In that call, we
check if the connection is alive, if not, we call:

nsldapi_free_connection( ld, ld->ld_defconn, NULL, NULL, 1, 0 );

This call basically frees the LDAPConn structure.  Usually, the
structure is only freed when its refcount drops to zero, since it may
be in use (pointed to) by places other than the man ld_conns linked
list (such as requests that are in flight).  The issue with this
particular version of the call, however, is the second last parameter
'1' - this indicates "force kill" the connection.  This makes the
free_connection code ignore the refcount and kill the connection
regardless.

I believe this causes the crash, because the connection may still be
referenced by requests in progress, and they have no indication that
the associate memory is free.  In particular, I see the crash inside
nsldapi_connection_lost_nolock at this line:

( lr->lr_conn != NULL && lr->lr_conn->lconn_sb == sb )

Here, we deference the connection associated with the request -
however, the memory associated with the connection may already have
been freed, as per above.  Usually, this doesn't even cause a crash,
since that memory page is typically still assigned to the process, so
the comparison will just fail and we continue on our merry little
way.  However, should the OS decide the reclaim the associated page,
you may be a segfault/access violation, and I believe that's what
occurs in our case - the pointer we are crashing on is close to but
just outside the list of segments currently allocated to the process.

I'd like to submit a patch for this issue, but I'd like advice on the
correct way to fix it.  Suggestion 1 is to simple remove the "force"
option on the connection_free call - why are we using it this
context?  Is there some reason to ignore the refcount here?  This fix
is easy and seems "correct" to me.  Suggestion 2, if there is some
reason to keep this force behavior, is to add code in the force case
that locks the request list and iterates though it looking for
references to the connection to be freed and removing them (NULLing
them out) - thus we won't try to access them in the connection lost
code, above.

Any guidance?

_______________________________________________
dev-tech-ldap mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-ldap

Reply via email to