hi, when libssl is linked against modules that can be dlopen()ed 
and dlclose()d multiple times during the process life time, we hit a 
memory corruption where the hash table built in memory for the error 
strings references SSL strings in the address range previously occupied 
by libssl. The behaviour depends on what other shared objects are loaded 
in between and when etc. but in general the hash table can end up 
referencing some "wrong" memory. Verified with 0.9.8e, 0.9.8o, 1.0.0a. 
For 1.0.0a, a modification of the attached program might be needed, 
details in the code.

        we hit the problem on OpenSolaris where Apache started to crash 
when loading mod_ssl. It loads all built-in engines and on Solaris we 
end up in the PKCS#11 engine loading libpkcs11.so.1 with other modules 
being dlopen()ed and some of them are linked against libssl. We however 
believe this is a generic problem and other systems could be hit by that 
sooner or later.

        the attached program simulates the problem using dlopen() 
directly on libssl and can be built on FreeBSD (we used 7.2) and 
OpenSolaris. It also contains instructions on how to simulate the 
problem on Linux (modifications needed depending on the distro) as well. 
To show the problem we mmap() some anonymous memory with the PROT_NONE 
flag right after dlclose() on libssl but before another dlopen() is 
called on it. The next SSL_load_error_strings() causes a crash then:

$ gcc -g -lcrypto ssl-dlopen-crash.c 
$ ./a.out 
Opening libssl.so...
Initializing with SSL_load_error_strings...
Closing libssl.so...
Opening libssl.so...
Initializing with SSL_load_error_strings...
Segmentation Fault (core dumped)

$ mdb core
mdb: core file data for mapping at fe84f000 not saved: Bad address
Loading modules: [ libuutil.so.1 ld.so.1 ]
> $c
libcrypto.so.0.9.8`err_cmp+6(fecb70b0, 8047758, 8047748, feeda26e)
libcrypto.so.0.9.8`getrn+0x7d(8061558, 8047758, 804770c, 110)
libcrypto.so.0.9.8`lh_retrieve+0x1e(8061558, 8047758, fef4b408, 12a)
libcrypto.so.0.9.8`int_err_get_item+0x51(8047758, fef69000, 8047768, feedd808)
libcrypto.so.0.9.8`ERR_func_error_string+0x47(14064000, fef69000, 8047788, 
fe82cff8)
libssl.so.0.9.8`ERR_load_SSL_strings+0x23(8051270)
libssl.so.0.9.8`SSL_load_error_strings+0x1d(fe822330, feac0b80, 80477cc, 
80511a9, 
fed11112, 80511ba)
testssl+0x83(fed11112, 80511ba, 80612ec, 80477ac, 80477cc, 80511fa)
main+0x5f(1, 80477f0, 80477f8)
_start+0x80(1, 8047900, 0, 8047908, 8047921, 8047943)

        we think that the proper fix is to make a copy of those strings 
when bulding the hash table. Using ERR_free_strings() is not a solution 
since that frees all the strings, also those needed by libcrypto. A new 
function in the libssl's fini section removing only SSL strings would 
also probably solve the problem.

        the workaround is simple. Either use LD_PRELOAD on libssl to 
lock it in the memory:

$ LD_PRELOAD=/lib/libssl.so.0.9.8 ./a.out 
Opening libssl.so...
Initializing with SSL_load_error_strings...
Closing libssl.so...
Opening libssl.so...
Initializing with SSL_load_error_strings...
Closing libssl.so...

        or build the library with "-znodelete". The linker on Solaris 
supports this option. This ensures that libssl is never unloaded.

        I think we should file a bug in the RT. Is there anything else 
we should provide?

        thanks, Jan.

-- 
Jan Pechanec
http://blogs.sun.com/janp
/*
 * Demo for the SSL memory corruption bug. The problem is if libssl is
 * dlopen()ed, SSL error strings loaded, and the library is dlclose()d then. The
 * hash string table built in memory still uses references to the memory
 * previously occupied by libssl since there is no clean-up. If we map
 * non-accessible memory to that address range, the next
 * SSL_load_error_strings() causes a SIGSEGV.
 *
 * Tested with 0.9.8e, 0.9.8o, and 1.0.0a. See a comment with mmap() below in
 * the code if you want to run this with 1.0.0.
 *
 * Compile:
 *
 * $ gcc -g -lcrypto ssl-dlopen-crash.c 
 *
 * Stack trace on FreeBSD 7.2:
 * #0  0x2813da5a in ERR_unload_strings () from /lib/libcrypto.so.5
 * #1  0x2818393e in lh_retrieve () from /lib/libcrypto.so.5
 * #2  0x2813e38e in ERR_get_implementation () from /lib/libcrypto.so.5
 * #3  0x2813dd77 in ERR_func_error_string () from /lib/libcrypto.so.5
 * #4  0x2844e1c0 in ERR_load_SSL_strings () from /usr/lib/libssl.so
 * #5  0x2844e18c in SSL_load_error_strings () from /usr/lib/libssl.so
 * #6  0x080488f4 in testssl () at ssl-dlopen-freebsd.c:53
 * #7  0x08048a11 in main () at ssl-dlopen-freebsd.c:113
 *
 * Stack trace on OpenSolaris:
 * libcrypto.so.0.9.8`err_cmp+6(fecb70b0, 804775c, 804774c, feeda26e)
 * libcrypto.so.0.9.8`getrn+0x7d(8061228, 804775c, 8047710, 110)
 * libcrypto.so.0.9.8`lh_retrieve+0x1e(8061228, 804775c, fef4b408, 12a)
 * libcrypto.so.0.9.8`int_err_get_item+0x51(804775c, fef69000, 804776c, 
feedd808)
 * libcrypto.so.0.9.8`ERR_func_error_string+0x47(14064000, fef69000, 804778c,
 *                                               fe82cff8)
 * libssl.so.0.9.8`ERR_load_SSL_strings+0x23(80477ac)
 * libssl.so.0.9.8`SSL_load_error_strings+0x1d(fe822330, feaa0a80, 80477b8,
 *                                             8050f03, 8047898, 80477e4)
 * testssl+0x76(8047898, 80477e4, 8050d3d, 1, 80477f0, 80477f8)
 * main+0x43(1, 80477f0, 80477f8, 80477ac)
 * _start+0x7d(1, 8047900, 0, 8047908, 8047921, 8047943)
 *
 *
 * We do not use Linux regularly and we found the only way to reproduce it there
 * which was to use gdb that switches off memory randomization and to run the
 * program with strace to get the right memory address:
 *
 * $ gdb strace
 * (gdb) set args -o output ./a.out
 * (gdb) run
 *
 * Next, check what memory address is used to mmap() libssl at in the output
 * file. Use that exact memory address in the mmap() call below in the code with
 * the MAP_FIXED flag. You may need to use more than 250KB for 1.0.0 but be
 * careful, we needed MAP_FIXED to simulate the situation but a greater number
 * can cause a crash in mmap() itself. So, for example:
 *
 *      mmap((void *)0xf7f93000, 250 * 1024, PROT_NONE,
 *          MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0);
 *
 * And run the program in gdb again. We can see this on our Linux distro:
 *
 * $ gdb ./a.out
 * (gdb) run
 * Starting program: /afs/ms.mff.cuni.cz/u/p/pechanec/a.out 
 * Opening libssl.so...
 * Initializing with SSL_load_error_strings...
 * Closing libssl.so...
 * Opening libssl.so...
 * Initializing with SSL_load_error_strings...
 * 
 * Program received signal SIGSEGV, Segmentation fault.
 * 0xf7ee99cf in ?? () from /usr/lib/libcrypto.so.0.9.8
 * (gdb) bt
 * #0  0xf7ee99cf in ?? () from /usr/lib/libcrypto.so.0.9.8
 * #1  0xf7ee6fec in ?? () from /usr/lib/libcrypto.so.0.9.8
 * #2  0xf7fd8e20 in ?? ()
 * #3  0xffffc90c in ?? ()
 * #4  0x29063004 in ?? ()
 * #5  0xf7e7cfbd in CRYPTO_lock () from /usr/lib/libcrypto.so.0.9.8
 * #6  0xffffc90c in ?? ()
 * #7  0x14064057 in ?? ()
 * #8  0xf7ee99cb in ?? () from /usr/lib/libcrypto.so.0.9.8
 * #9  0x0804b008 in ?? ()
 * #10 0x00000000 in ?? ()
 */
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <link.h>
#include <unistd.h>
#include <err.h>

#include <openssl/crypto.h>
#include <openssl/err.h>
#include <openssl/ssl.h>
#include <sys/mman.h>

void
testssl(void)
{
        void *handle;
        void (*loadstrings)();

        printf("Opening libssl.so...\n");
        if ((handle = dlopen("/usr/lib/libssl.so", RTLD_LAZY)) == NULL) {
                printf("dlopen failed: %s\n", dlerror());
                return;
        }

        /*
         * Call the function to initialize the SSL error string tables.  The
         * actual error tables are managed in libcrypto so this is where the
         * problem starts.  libcrypto memory ends up with pointers to static
         * memory from libssl.
         */
        loadstrings = (void (*)())dlsym(handle, "SSL_load_error_strings");
        if (loadstrings != NULL) {
                printf("Initializing with SSL_load_error_strings...\n");
                (*loadstrings)();
        } else {
                printf("dlsym failed to find "
                    "SSL_load_error_strings: %s\n", dlerror());
        }

        printf("Closing libssl.so...\n");
        dlclose(handle);
}

int
main(int argc, char *argv[])
{
        /*
         * Since we linked with libcrypto directly, call the libcrypto error
         * string initialization stuff.
         */
        ERR_load_crypto_strings();

        /*
         * Dynamically load libssl to simulate what happens with Apache mod_ssl
         * (and probably other libraries on the system as well). This function
         * also initializes the SSL error tables and then unloads libssl.
         */
        testssl();
        
        /*
         * Map some anonymous memory that can not be accessed. Usually it will
         * be mapped from the address where libssl was before. With libssl.1.0.0
         * you might need to increase the segment to 350KB since libssl from
         * 1.0.0 is larger than from 0.9.8.
         */
        if (mmap(0, 280 * 1024, PROT_NONE, MAP_ANON | MAP_PRIVATE, -1, 0)
            == MAP_FAILED) {
                err(1, "mmap");
        }

        /*
         * Repeat the above testssl step. This normally causes SEGV because the
         * error tables managed by libcrypto have bad pointers still pointing to
         * the old libssl which now is occupied by the mapped anonymous pages
         * that can not be accessed. The libssl library does not cleanup
         * properly before it closes.
         */
        testssl();
        return(0);
}

Reply via email to