[ 
https://issues.apache.org/jira/browse/THRIFT-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954145#comment-15954145
 ] 

James E. King, III commented on THRIFT-4164:
--------------------------------------------

In the pull request where the first four builds failed with cores, I was able 
to reproduce it locally as well.  It is very interesting.  We were resetting 
the openssl thread safety/locking callbacks to NULL as part of destruction of 
the last TSSLSocketFactory instance.  In the test, we hold on to this socket 
factory through the test and then it gets destroyed:
{noformat}
(gdb) bt
#0  apache::thrift::transport::cleanupOpenSSL () at 
src/thrift/transport/TSSLSocket.cpp:137
#1  0x00007ffff7fa10a5 in 
apache::thrift::transport::TSSLSocketFactory::~TSSLSocketFactory 
(this=0x48d6c0, __in_chrg=<optimized out>) at 
src/thrift/transport/TSSLSocket.cpp:696
#2  0x00007ffff7fa1199 in 
apache::thrift::transport::TSSLSocketFactory::~TSSLSocketFactory 
(this=0x48d6c0, __in_chrg=<optimized out>) at 
src/thrift/transport/TSSLSocket.cpp:698
#3  0x0000000000417d6a in boost::detail::sp_counted_base::release 
(this=0x4a95c0) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#4  0x0000000000414407 in release (this=0x4a95c0) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:144
#5  ~shared_count (this=<synthetic pointer>, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#6  ~shared_ptr (this=<synthetic pointer>, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/shared_ptr.hpp:328
#7  main (argc=<optimized out>, argv=<optimized out>) at src/TestClient.cpp:235
{noformat}
Then later on in the test, we close a TSSLSocket that the factory created:
{noformat}
(gdb) bt
#0  0x00007ffff6c56cc9 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6c5a0d8 in __GI_abort () at abort.c:89
#2  0x00007ffff6c4fb86 in __assert_fail_base (fmt=0x7ffff6da0830 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffff7fcf106 "px 
!= 0", file=file@entry=0x7ffff7fd1400 
"/usr/include/boost/smart_ptr/shared_array.hpp", line=line@entry=194, 
    function=function@entry=0x7ffff7fd5fa0 
<boost::shared_array<apache::thrift::concurrency::Mutex>::operator[](long) 
const::__PRETTY_FUNCTION__> "T& 
boost::shared_array<T>::operator[](std::ptrdiff_t) const [with T = 
apache::thrift::concurrency::Mutex; std::ptrdiff_t = long int]") at assert.c:92
#3  0x00007ffff6c4fc32 in __GI___assert_fail 
(assertion=assertion@entry=0x7ffff7fcf106 "px != 0", 
file=file@entry=0x7ffff7fd1400 "/usr/include/boost/smart_ptr/shared_array.hpp", 
line=line@entry=194, 
    function=function@entry=0x7ffff7fd5fa0 
<boost::shared_array<apache::thrift::concurrency::Mutex>::operator[](long) 
const::__PRETTY_FUNCTION__> "T& 
boost::shared_array<T>::operator[](std::ptrdiff_t) const [with T = 
apache::thrift::concurrency::Mutex; std::ptrdiff_t = long int]") at assert.c:101
#4  0x00007ffff7fa039d in operator[] (this=<optimized out>, i=<optimized out>) 
at /usr/include/boost/smart_ptr/shared_array.hpp:194
#5  apache::thrift::transport::callbackLocking (mode=<optimized out>, 
n=<optimized out>) at src/thrift/transport/TSSLSocket.cpp:73
#6  0x00007ffff6644877 in CRYPTO_add_lock (pointer=0x4a8894, amount=-1, 
type=12, file=0x7ffff6a08c4a "ssl_lib.c", line=1944) at cryptlib.c:632
#7  0x00007ffff69f840c in SSL_CTX_free (a=0x4a8800) at ssl_lib.c:1944
#8  0x00007ffff7fa0209 in apache::thrift::transport::SSLContext::~SSLContext 
(this=0x4a85c0, __in_chrg=<optimized out>) at 
src/thrift/transport/TSSLSocket.cpp:199
#9  0x00007ffff7fa5832 in release (this=0x4a95a0) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#10 ~shared_count (this=0x4a9f20, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#11 ~shared_ptr (this=0x4a9f18, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/shared_ptr.hpp:328
#12 apache::thrift::transport::TSSLSocket::~TSSLSocket (this=0x4a9e80, 
__in_chrg=<optimized out>) at src/thrift/transport/TSSLSocket.cpp:238
#13 0x00007ffff7fa58a9 in apache::thrift::transport::TSSLSocket::~TSSLSocket 
(this=0x4a9e80, __in_chrg=<optimized out>) at 
src/thrift/transport/TSSLSocket.cpp:240
#14 0x00007ffff7fad03a in release (this=0x4a9f40) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#15 ~shared_count (this=0x4ad640, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#16 ~shared_ptr (this=0x4ad638, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/shared_ptr.hpp:328
#17 ~TBufferedTransport (this=0x4ad610, __in_chrg=<optimized out>) at 
./src/thrift/transport/TBufferTransports.h:184
#18 apache::thrift::transport::TBufferedTransport::~TBufferedTransport 
(this=0x4ad610, __in_chrg=<optimized out>) at 
./src/thrift/transport/TBufferTransports.h:184
#19 0x0000000000417d6a in boost::detail::sp_counted_base::release 
(this=0x4ae910) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#20 0x000000000041446f in release (this=<optimized out>) at 
/usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:144
#21 ~shared_count (this=0x7fffffffdc98, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/detail/shared_count.hpp:371
#22 ~shared_ptr (this=0x7fffffffdc90, __in_chrg=<optimized out>) at 
/usr/include/boost/smart_ptr/shared_ptr.hpp:328
#23 main (argc=<optimized out>, argv=<optimized out>) at src/TestClient.cpp:231
{noformat}

At this point where we destroy the TSSLSocket, which calls into SSL_CTX_free, 
which wants to use one of the locks to be threadsafe, it cannot use the lock 
because we destroyed it already when the TSSLSocketFactory was destroyed.  We 
need to guarantee the lifetime of the TSSLSocketFactory outlives any TSSLSocket 
that it makes, since destruction of the last TSSLSocketFactory causes an 
openssl teardown.

> Core in TSSLSocket cleanupOpenSSL when destroying a mutex used by openssl
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-4164
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4164
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.10.0
>         Environment: Ubuntu 14.04, openssl (version TBD, I believe it is a 
> 1.1.0 variant)
>            Reporter: James E. King, III
>            Assignee: James E. King, III
>
> In a project where thrift is used, i was investigating a core in an assertion 
> in apache::thrift::concurrency::~Mutex (pthread variety).  The mutex in 
> question was one of the locking mutexes that thrift gives to openssl.  The 
> core occurred in TSSLSocket::cleanupOpenSSL() where the mutexes are destroyed 
> (on the last line).
> I suspect that we might be changing the locking callbacks too early in the 
> cleanup process; perhaps one of the other cleanup calls that follows it would 
> have released a mutex in some situations?  In any case, this needs to be 
> investigated and I am assigning it to myself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to