Hi,


Why does Solaris 10 on SparcIII not suffer from this problem, is unclear to me. 
We always use -lmalloc for our application, and have only one solaris 10 build 
(used on all Solaris systems, regardless the processor type).



Unfortunatelly, (our) support for Solaris is system limited, and we do not have 
a contract on our UltraII box :-( I can’t ask Sun for a fix.

Can you please provide me the sources for reproduction on Solaris x86. I then 
will try it on our system. If I can reproduce the problem on a Solaris system 
with a contract, I will file the defect to Sun/Oracle.



Kees



-----Original Message-----

From: Andy Polyakov via RT [mailto:r...@openssl.org] 

Sent: Tuesday, 24 August, 2010 20:54

To: Kees Dekker

Cc: openssl-dev@openssl.org

Subject: Re: [openssl.org #2321] bug report: core dump on OPENSSL_cpuid_setup() 
on Solaris 10 with a Sun Enterprise 450 system



>>>>> The 32-bit of openSSL 1.0.0a (solaris-sparcv9-cc configuration)

>>>>> coredumps upon initialization. The stack trace is (of our product

>>>>> binary):

>> But you can still copy apps/openssl binary to this old system and

>> invoke

>> it, say with 'version' command-line argument... If it doesn't crash,

>> then the question would be what is special about *your* application and

>> what can be done about it.

> 

> I tried it, and it worked (surprise for me)

> # OPENSSL_CONF=ssl/openssl.cnf ./openssl 

> OpenSSL> version

> 

> Here is some more detailed stack info (gdb, using openSSL with using 
> debug-solaris-sparcv9-cc instead of solaris-sparcv9-cc).

> The result is not a real debug build, without optimization, but the 
> optimization level is far lower (-O instead of -xO5).

> 

> *******gdb output***********

> 131                             (*di_fini)(root_node);

> (gdb) n

> 

> Program received signal SIGSEGV, Segmentation fault.

> 0xff380c90 in free_unlocked () from /usr/lib/libmalloc.so.1

> *******end of gdb output***********

> 

> The call to di_fini() casues to fire free(), which causes a SIGSEGV.



I see. So that it's not dlclose, but di_fini... Thanks.



> I can't really prove, but one of the differences of the openssl

> application and our one is that -lmalloc was used. May be

> dlopen(libdevinfo.so) conflicts a little with -lmalloc, since -lc

> also contains free/malloc (but no mallinfo(), used by us).



I managed to reproduce the problem on Solaris 10 x86(!) by copying

portion of code from dlopen to dlclose into separate file and linking it

with -lmalloc. I.e. your assumption appears to be correct: libdevinfo.so

is effectively incompatible with libmalloc.so :-(



As act of desperation I've added call to mallopt(M_KEEP,0) prior dlopen

and SIGSEGV ... gone. Can you check this with your application? Note

that it wouldn't work to add the call to your main() function, because

OPENSSL_cpuid is called earlier from .init segment. You have to add it

to OPENSSL_cpuid_setup in crypto/sparcv9cap.c...



>> Another possible workaround is to explicitly link your application with

>> -ldevinfo. In this case dlopen/dlclose would only increment/decrement

>> reference counter, but not actually do anything upon dlclose.

> 

> Explictly linking with -ldevinfo did not help.



Because my initial guess was wrong, so no surprises here:-) A.





______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to