This is from yesterday.  Apparently I only sent this to Chris rather than
the list, as I didn't see it on the List.  Apologies if you've already seen
it.  It's relevant to my posting of a few minutes ago.
--------------------
The problem appears to be (from visual inspection - I didn't try to watch
thread_hash in a debugger) that ERR_remove_state (crypto/err/err.c)
interrogates thread_hash without a Lock.   I'm guessing that at the top of
the function, thread_hash is non-null, but since the function isn't Locking
its access to thread_hash, by the time it gets around to calling lh_delete,
thread_hash is, in fact, NULL, and we blow up in lh_delete.

Here's the original code, with a proposed fix below:

void ERR_remove_state(unsigned long pid)
        {
        ERR_STATE *p,tmp;
        
        if (thread_hash == NULL)         
                return;
        if (pid == 0)
                pid=(unsigned long)CRYPTO_thread_id();
        tmp.pid=pid;
        CRYPTO_w_lock(CRYPTO_LOCK_ERR);
        p=(ERR_STATE *)lh_delete(thread_hash,&tmp);
        if (lh_num_items(thread_hash) == 0)
                {
                /* make sure we don't leak memory */
                lh_free(thread_hash);
                thread_hash = NULL;
                }
        CRYPTO_w_unlock(CRYPTO_LOCK_ERR);
        
        if (p != NULL) ERR_STATE_free(p);
        }

I would propose this change:

void ERR_remove_state(unsigned long pid)
        {
        ERR_STATE *p,tmp;
        
        CRYPTO_w_lock(CRYPTO_LOCK_ERR);
        if (thread_hash)
                {
                        if (pid == 0)
                                pid=(unsigned long)CRYPTO_thread_id();
                        tmp.pid=pid;
                        p=(ERR_STATE *)lh_delete(thread_hash,&tmp);
                        if (lh_num_items(thread_hash) == 0)
                                {
                                /* make sure we don't leak memory */
                                lh_free(thread_hash);
                                thread_hash = NULL;
                                }
                        }
        CRYPTO_w_unlock(CRYPTO_LOCK_ERR);
        
        if (p != NULL) ERR_STATE_free(p);
        }

Does anyone agree or disagree that the first version is wrong, and is the
source of my crashes, and that the second version solves the problem?

Bill Rebey


-----Original Message-----
From:   Chris Zimman [mailto:[EMAIL PROTECTED]]
Sent:   Monday, July 24, 2000 12:34 PM
To:     Bill Rebey
Subject:        Re: Bus Errors


>       
>               int iErr = ERR_get_error ();
>               ERR_error_string (iErr, buf);
>               ERR_reason_error_string (iErr);
>               ERR_remove_state (0);
> 
> The call to ERR_remove_state (0) is the one that sporadically causes bus
> errors.  I am using locks and locking callbacks, and they all seem to be
> working just fine. If I take out the ERR_remove_state (0); call, things
> appear to work just fine and my program never crashes. 

> [1] lh_delete(0x0, 0xe200f8f4, 0xb51f0, 0x28e, 0xe200fa38, 0xdac18), at
> 0x55738
>   [2] ERR_remove_state(0x0, 0xcf000, 0x0, 0xb51f0, 0x0, 0x0), at 0x56fc4

As is indicated by your stack trace, lh_delete() is being passed a NULL
pointer which would cause it to crash almost immediately after going into 
lh_delete(), since there's an access via it:

crypto/lhash/lhash.c:223

        lh->error = 0;

What's interesting is that thread_hash is a static global in
crypto/err/err.c -- I'm missing where this is getting set to NULL as
there's a check for this in ERR_remove_state() and there seem to be
appropriate locks around all functions that modify this.

Can you set a watch to see where thread_hash is being set to NULL?  This
would probably provide some further insight.

--Chris

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to