Addition: building the openssl application with -lmalloc results in the same coredump.... May be dlopen(libdevinfo.so.1) and using -lmalloc does not work together (at least on UltraSparcII).
Kees > -----Original Message----- > From: Kees Dekker > Sent: Tuesday, 24 August, 2010 14:25 > To: 'r...@openssl.org' > Cc: openssl-dev@openssl.org > Subject: RE: [openssl.org #2321] bug report: core dump on > OPENSSL_cpuid_setup() on Solaris 10 with a Sun Enterprise 450 system > > > > > -----Original Message----- > > From: Andy Polyakov via RT [mailto:r...@openssl.org] > > Sent: Monday, 23 August, 2010 17:23 > > To: Kees Dekker > > Cc: openssl-dev@openssl.org > > Subject: Re: [openssl.org #2321] bug report: core dump on > > OPENSSL_cpuid_setup() on Solaris 10 with a Sun Enterprise 450 system > > > > Hi, > > > > >>> The 32-bit of openSSL 1.0.0a (solaris-sparcv9-cc configuration) > > >>> coredumps upon initialization. The stack trace is (of our product > > >>> binary): > > >> Does reference to your product binary mean that apps/openssl does > > not > > >> crash? In other words does 'make test' pass? If so, then the > > question > > >> is > > >> how come? Try to truss your application and compare it to truss > > output > > >> for 'apps/openssl version'... Try to single-step your application > in > > >> debugger... > > > > > > The openSSL and cURL libs are built on a different system (e.g. Sun > > > Fire V440 or Sun Fire T200). This (old) system being used, where > the > > > crash occurs, is a test system, and not equiped with a compiler > > > (similar to customer situation). So (re)building on this test > system, > > > or run make test is not possible. > > > > But you can still copy apps/openssl binary to this old system and > > invoke > > it, say with 'version' command-line argument... If it doesn't crash, > > then the question would be what is special about *your* application > and > > what can be done about it. > > I tried it, and it worked (surprise for me) > # OPENSSL_CONF=ssl/openssl.cnf ./openssl > OpenSSL> version > OpenSSL 1.0.0a 1 Jun 2010 > OpenSSL> quit > > > > >>> #0 0xff360c90 in free_unlocked () from /usr/lib/libmalloc.so.1 > > >>> #1 0xff360b78 in free () from /usr/lib/libmalloc.so.1 > > >>> #2 0x007107a4 in OPENSSL_cpuid_setup () > > >>> #3 0x00791784 in ?? () > > >>> #4 0x00791784 in ?? () > > >> Note that OPENSSL_cpuid_setup does not call free() (see > > >> crypto/sparcv9cap.c). It does call > > dlopen("libdevinfo.so.1",RTLD_LAZY) > > >> and dlclose(h) though... As well as some functions from > > libdevinfo... I > > >> mean chances are that root of the problem lies outside > > >> OPENSSL_cpuid_setup... It's easy to verify by setting > > >> OPENSSL_sparcv9cap > > >> environment variable (value of 3 is appropriate for USII) prior > > >> starting > > >> your application. > > > > > > I saw that no free was called (I did check the source code as > well), > > > but Using OPENSSL_sparcv9cap=3 works well. But skipping the dlopen > by > > > setting OPENSSL_sparcv9cap environment variable solved the > > > problem... So I'm not 100% sure that the problem is outside > > > OPENSSL_cpui_setup(). But I also can't explain why this problem did > > > not exist on our other/newer (V440/T200) systems. These two are not > > > Ultra-sparcII, but UltraSparc-IIIi/UltraSparc-T1 respectively. > > > > I meant "outside" as "in code beyond my control", such as function in > > vendor-supplied libraries, e.g. libdevinfo.so or libdl.so. I'd guess, > > and truss log suggests that, crash occurs in dlclose (note that it > > closed /devices/pseudo/devi...@0:devinfo's file descriptor, meaning > > that > > di_walk_node succeeded). Try to comment out dlclose and see if it > > helps... > > Here is some more detailed stack info (gdb, using openSSL with using > debug-solaris-sparcv9-cc instead of solaris-sparcv9-cc). > The result is not a real debug build, without optimization, but the > optimization level is far lower (-O instead of -xO5). > > *******gdb output*********** > 99 OPENSSL_sparcv9cap_P |= > SPARCV9_PREFER_FPU; > (gdb) p OPENSSL_sparcv9cap_P > $1 = 1 > (gdb) n > 102 if (sysinfo(SI_ISALIST,si,sizeof(si))>0) > (gdb) p OPENSSL_sparcv9cap_P > $2 = 3 > (gdb) n > 104 if (strstr(si,"+vis")) > (gdb) n > 106 if (strstr(si,"+vis2")) > (gdb) n > 105 OPENSSL_sparcv9cap_P |= SPARCV9_VIS1; > (gdb) n > 106 if (strstr(si,"+vis2")) > (gdb) p OPENSSL_sparcv9cap_P > $3 = 7 > (gdb) n > 109 OPENSSL_sparcv9cap_P &= > ~SPARCV9_TICK_PRIVILEGED; > (gdb) n > 114 if ((h = dlopen("libdevinfo.so.1",RTLD_LAZY))) do > (gdb) p OPENSSL_sparcv9cap_P > $4 = 7 > (gdb) n > 122 if (!DLLINK(h,di_init)) break; > (gdb) p OPENSSL_sparcv9cap_P > $5 = 7 > (gdb) n > 114 if ((h = dlopen("libdevinfo.so.1",RTLD_LAZY))) do > (gdb) n > 122 if (!DLLINK(h,di_init)) break; > (gdb) n > 123 if (!DLLINK(h,di_fini)) break; > (gdb) n > 124 if (!DLLINK(h,di_walk_node)) break; > (gdb) n > 125 if (!DLLINK(h,di_node_name)) break; > (gdb) n > 127 if ((root_node = > (*di_init)("/",DINFOSUBTREE))!=DI_NODE_NIL) > (gdb) n > 130 > di_node_name,walk_nodename); > (gdb) n > 131 (*di_fini)(root_node); > (gdb) n > > Program received signal SIGSEGV, Segmentation fault. > 0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1 > > Program received signal SIGSEGV, Segmentation fault. > 0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1 > (gdb) bt > #0 0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1 > #1 0xff380b78 in free () from /usr/lib/libmalloc.so.1 > #2 0x00707804 in OPENSSL_cpuid_setup () at > /vobs/obj.dbg.SOL10/thirdparty/OpenSSL/32bit/openssl- > 1.0.0a/crypto/sparcv9cap.c:131 > #3 0x007845cc in ?? () > warning: (Internal error: pc 0x0 in read in psymtab, but not in > symtab.) > > #4 0x007845cc in ?? () > warning: (Internal error: pc 0x0 in read in psymtab, but not in > symtab.) > > Backtrace stopped: previous frame identical to this frame (corrupt > stack?) > *******end of gdb output*********** > > The call to di_fini() casues to fire free(), which causes a SIGSEGV. > > I can't really prove, but one of the differences of the openssl > application and our one is that -lmalloc was used. May be > dlopen(libdevinfo.so) conflicts a little with -lmalloc, since -lc also > contains free/malloc (but no mallinfo(), used by us). > > # ldd openssl > libsocket.so.1 => /lib/libsocket.so.1 > libnsl.so.1 => /lib/libnsl.so.1 > libdl.so.1 => /lib/libdl.so.1 > libc.so.1 => /lib/libc.so.1 > libmp.so.2 => /lib/libmp.so.2 > libmd.so.1 => /lib/libmd.so.1 > libscf.so.1 => /lib/libscf.so.1 > libdoor.so.1 => /lib/libdoor.so.1 > libuutil.so.1 => /lib/libuutil.so.1 > libgen.so.1 => /lib/libgen.so.1 > libm.so.2 => /lib/libm.so.2 > /platform/SUNW,Ultra-4/lib/libc_psr.so.1 > /platform/SUNW,Ultra-4/lib/libmd_psr.so.1 > > # ldd kshell6.2.new > libmalloc.so.1 => /usr/lib/libmalloc.so.1 > libm.so.2 => /lib/libm.so.2 > libsocket.so.1 => /lib/libsocket.so.1 > libnsl.so.1 => /lib/libnsl.so.1 > libdl.so.1 => /lib/libdl.so.1 > libw.so.1 => /lib/libw.so.1 > librt.so.1 => /lib/librt.so.1 > libthread.so.1 => /lib/libthread.so.1 > libpam.so.1 => /lib/libpam.so.1 > libCstd.so.1 => /usr/lib/libCstd.so.1 > libCrun.so.1 => /usr/lib/libCrun.so.1 > libc.so.1 => /lib/libc.so.1 > libmp.so.2 => /lib/libmp.so.2 > libmd.so.1 => /lib/libmd.so.1 > libscf.so.1 => /lib/libscf.so.1 > libaio.so.1 => /lib/libaio.so.1 > libcmd.so.1 => /lib/libcmd.so.1 > libdoor.so.1 => /lib/libdoor.so.1 > libuutil.so.1 => /lib/libuutil.so.1 > libgen.so.1 => /lib/libgen.so.1 > /usr/lib/cpu/sparcv8plus/libCstd_isa.so.1 > /platform/SUNW,Ultra-4/lib/libc_psr.so.1 > /platform/SUNW,Ultra-4/lib/libmd_psr.so.1 > > # ldd /usr/lib/libdevinfo.so > libnvpair.so.1 => /lib/libnvpair.so.1 > libsec.so.1 => /lib/libsec.so.1 > libc.so.1 => /lib/libc.so.1 > libgen.so.1 => /lib/libgen.so.1 > libnsl.so.1 => /lib/libnsl.so.1 > libavl.so.1 => /lib/libavl.so.1 > libmp.so.2 => /lib/libmp.so.2 > libmd.so.1 => /lib/libmd.so.1 > libscf.so.1 => /lib/libscf.so.1 > libdoor.so.1 => /lib/libdoor.so.1 > libuutil.so.1 => /lib/libuutil.so.1 > libm.so.2 => /lib/libm.so.2 > /platform/SUNW,Ultra-4/lib/libc_psr.so.1 > /platform/SUNW,Ultra-4/lib/libmd_psr.so.1 > > > > > > Another possible workaround is to explicitly link your application > with > > -ldevinfo. In this case dlopen/dlclose would only increment/decrement > > reference counter, but not actually do anything upon dlclose. A. > > > > Explictly linking with -ldevinfo did not help. Same core dump at same > place. Still thinking about malloc() (from libc.so) and free() (from > libmalloc.so) mismatches. I don’t know how to solve this. It may point > to a Solaris issue on UltraSparcII (problem did not exist on my > newer/less older UltraSparc systems). > > KD