> -----Original Message-----
> From: Andy Polyakov via RT [mailto:r...@openssl.org]
> Sent: Monday, 23 August, 2010 17:23
> To: Kees Dekker
> Cc: openssl-dev@openssl.org
> Subject: Re: [openssl.org #2321] bug report: core dump on
> OPENSSL_cpuid_setup() on Solaris 10 with a Sun Enterprise 450 system
>
> Hi,
>
> >>> The 32-bit of openSSL 1.0.0a (solaris-sparcv9-cc configuration)
> >>> coredumps upon initialization. The stack trace is (of our product
> >>> binary):
> >> Does reference to your product binary mean that apps/openssl does
> not
> >> crash? In other words does 'make test' pass? If so, then the
> question
> >> is
> >> how come? Try to truss your application and compare it to truss
> output
> >> for 'apps/openssl version'... Try to single-step your application in
> >> debugger...
> >
> > The openSSL and cURL libs are built on a different system (e.g. Sun
> > Fire V440 or Sun Fire T200). This (old) system being used, where the
> > crash occurs, is a test system, and not equiped with a compiler
> > (similar to customer situation). So (re)building on this test system,
> > or run make test is not possible.
>
> But you can still copy apps/openssl binary to this old system and
> invoke
> it, say with 'version' command-line argument... If it doesn't crash,
> then the question would be what is special about *your* application and
> what can be done about it.
I tried it, and it worked (surprise for me)
# OPENSSL_CONF=ssl/openssl.cnf ./openssl
OpenSSL> version
OpenSSL 1.0.0a 1 Jun 2010
OpenSSL> quit
>
> >>> #0 0xff360c90 in free_unlocked () from /usr/lib/libmalloc.so.1
> >>> #1 0xff360b78 in free () from /usr/lib/libmalloc.so.1
> >>> #2 0x007107a4 in OPENSSL_cpuid_setup ()
> >>> #3 0x00791784 in ?? ()
> >>> #4 0x00791784 in ?? ()
> >> Note that OPENSSL_cpuid_setup does not call free() (see
> >> crypto/sparcv9cap.c). It does call
> dlopen("libdevinfo.so.1",RTLD_LAZY)
> >> and dlclose(h) though... As well as some functions from
> libdevinfo... I
> >> mean chances are that root of the problem lies outside
> >> OPENSSL_cpuid_setup... It's easy to verify by setting
> >> OPENSSL_sparcv9cap
> >> environment variable (value of 3 is appropriate for USII) prior
> >> starting
> >> your application.
> >
> > I saw that no free was called (I did check the source code as well),
> > but Using OPENSSL_sparcv9cap=3 works well. But skipping the dlopen by
> > setting OPENSSL_sparcv9cap environment variable solved the
> > problem... So I'm not 100% sure that the problem is outside
> > OPENSSL_cpui_setup(). But I also can't explain why this problem did
> > not exist on our other/newer (V440/T200) systems. These two are not
> > Ultra-sparcII, but UltraSparc-IIIi/UltraSparc-T1 respectively.
>
> I meant "outside" as "in code beyond my control", such as function in
> vendor-supplied libraries, e.g. libdevinfo.so or libdl.so. I'd guess,
> and truss log suggests that, crash occurs in dlclose (note that it
> closed /devices/pseudo/devi...@0:devinfo's file descriptor, meaning
> that
> di_walk_node succeeded). Try to comment out dlclose and see if it
> helps...
Here is some more detailed stack info (gdb, using openSSL with using
debug-solaris-sparcv9-cc instead of solaris-sparcv9-cc).
The result is not a real debug build, without optimization, but the
optimization level is far lower (-O instead of -xO5).
*******gdb output***********
99 OPENSSL_sparcv9cap_P |= SPARCV9_PREFER_FPU;
(gdb) p OPENSSL_sparcv9cap_P
$1 = 1
(gdb) n
102 if (sysinfo(SI_ISALIST,si,sizeof(si))>0)
(gdb) p OPENSSL_sparcv9cap_P
$2 = 3
(gdb) n
104 if (strstr(si,"+vis"))
(gdb) n
106 if (strstr(si,"+vis2"))
(gdb) n
105 OPENSSL_sparcv9cap_P |= SPARCV9_VIS1;
(gdb) n
106 if (strstr(si,"+vis2"))
(gdb) p OPENSSL_sparcv9cap_P
$3 = 7
(gdb) n
109 OPENSSL_sparcv9cap_P &=
~SPARCV9_TICK_PRIVILEGED;
(gdb) n
114 if ((h = dlopen("libdevinfo.so.1",RTLD_LAZY))) do
(gdb) p OPENSSL_sparcv9cap_P
$4 = 7
(gdb) n
122 if (!DLLINK(h,di_init)) break;
(gdb) p OPENSSL_sparcv9cap_P
$5 = 7
(gdb) n
114 if ((h = dlopen("libdevinfo.so.1",RTLD_LAZY))) do
(gdb) n
122 if (!DLLINK(h,di_init)) break;
(gdb) n
123 if (!DLLINK(h,di_fini)) break;
(gdb) n
124 if (!DLLINK(h,di_walk_node)) break;
(gdb) n
125 if (!DLLINK(h,di_node_name)) break;
(gdb) n
127 if ((root_node =
(*di_init)("/",DINFOSUBTREE))!=DI_NODE_NIL)
(gdb) n
130 di_node_name,walk_nodename);
(gdb) n
131 (*di_fini)(root_node);
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1
Program received signal SIGSEGV, Segmentation fault.
0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1
(gdb) bt
#0 0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1
#1 0xff380b78 in free () from /usr/lib/libmalloc.so.1
#2 0x00707804 in OPENSSL_cpuid_setup () at
/vobs/obj.dbg.SOL10/thirdparty/OpenSSL/32bit/openssl-1.0.0a/crypto/sparcv9cap.c:131
#3 0x007845cc in ?? ()
warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
#4 0x007845cc in ?? ()
warning: (Internal error: pc 0x0 in read in psymtab, but not in symtab.)
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
*******end of gdb output***********
The call to di_fini() casues to fire free(), which causes a SIGSEGV.
I can't really prove, but one of the differences of the openssl application and
our one is that -lmalloc was used. May be dlopen(libdevinfo.so) conflicts a
little with -lmalloc, since -lc also contains free/malloc (but no mallinfo(),
used by us).
# ldd openssl
libsocket.so.1 => /lib/libsocket.so.1
libnsl.so.1 => /lib/libnsl.so.1
libdl.so.1 => /lib/libdl.so.1
libc.so.1 => /lib/libc.so.1
libmp.so.2 => /lib/libmp.so.2
libmd.so.1 => /lib/libmd.so.1
libscf.so.1 => /lib/libscf.so.1
libdoor.so.1 => /lib/libdoor.so.1
libuutil.so.1 => /lib/libuutil.so.1
libgen.so.1 => /lib/libgen.so.1
libm.so.2 => /lib/libm.so.2
/platform/SUNW,Ultra-4/lib/libc_psr.so.1
/platform/SUNW,Ultra-4/lib/libmd_psr.so.1
# ldd kshell6.2.new
libmalloc.so.1 => /usr/lib/libmalloc.so.1
libm.so.2 => /lib/libm.so.2
libsocket.so.1 => /lib/libsocket.so.1
libnsl.so.1 => /lib/libnsl.so.1
libdl.so.1 => /lib/libdl.so.1
libw.so.1 => /lib/libw.so.1
librt.so.1 => /lib/librt.so.1
libthread.so.1 => /lib/libthread.so.1
libpam.so.1 => /lib/libpam.so.1
libCstd.so.1 => /usr/lib/libCstd.so.1
libCrun.so.1 => /usr/lib/libCrun.so.1
libc.so.1 => /lib/libc.so.1
libmp.so.2 => /lib/libmp.so.2
libmd.so.1 => /lib/libmd.so.1
libscf.so.1 => /lib/libscf.so.1
libaio.so.1 => /lib/libaio.so.1
libcmd.so.1 => /lib/libcmd.so.1
libdoor.so.1 => /lib/libdoor.so.1
libuutil.so.1 => /lib/libuutil.so.1
libgen.so.1 => /lib/libgen.so.1
/usr/lib/cpu/sparcv8plus/libCstd_isa.so.1
/platform/SUNW,Ultra-4/lib/libc_psr.so.1
/platform/SUNW,Ultra-4/lib/libmd_psr.so.1
# ldd /usr/lib/libdevinfo.so
libnvpair.so.1 => /lib/libnvpair.so.1
libsec.so.1 => /lib/libsec.so.1
libc.so.1 => /lib/libc.so.1
libgen.so.1 => /lib/libgen.so.1
libnsl.so.1 => /lib/libnsl.so.1
libavl.so.1 => /lib/libavl.so.1
libmp.so.2 => /lib/libmp.so.2
libmd.so.1 => /lib/libmd.so.1
libscf.so.1 => /lib/libscf.so.1
libdoor.so.1 => /lib/libdoor.so.1
libuutil.so.1 => /lib/libuutil.so.1
libm.so.2 => /lib/libm.so.2
/platform/SUNW,Ultra-4/lib/libc_psr.so.1
/platform/SUNW,Ultra-4/lib/libmd_psr.so.1
>
> Another possible workaround is to explicitly link your application with
> -ldevinfo. In this case dlopen/dlclose would only increment/decrement
> reference counter, but not actually do anything upon dlclose. A.
>
Explictly linking with -ldevinfo did not help. Same core dump at same place.
Still thinking about malloc() (from libc.so) and free() (from libmalloc.so)
mismatches. I don’t know how to solve this. It may point to a Solaris issue on
UltraSparcII (problem did not exist on my newer/less older UltraSparc systems).
KD
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majord...@openssl.org