[ 
https://issues.apache.org/jira/browse/THRIFT-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936661#comment-16936661
 ] 

Mario Emmenlauer commented on THRIFT-4946:
------------------------------------------

It seems we could track down the issue. It may we ll be related to OpenSSL 
thread safety. The problem appeared after an upgrade of OpenSSL where also 
build flags where changed, and `no-threads` was introduced. After switching 
back to `threads` support in OpenSSL the issue did not appear anymore.
It may still be worthwhile to keep this issue open because it seems that thrift 
tests require OpenSSL thread safety support, however I could not see that this 
is explicitly checked in the tests. It may be good to add a corresponding 
check, i.e. something like outlined in 
https://www.openssl.org/docs/manmaster/man3/CRYPTO_THREAD_lock_free.html that 
checks if OpenSSL was configured with thread support:
{code}
 #include <openssl/opensslconf.h>
 #if defined(OPENSSL_THREADS)
     /* thread support enabled */
 #else
     /* no thread support */
 #endif
{code}

> Memory corruption in SecurityTest
> ---------------------------------
>
>                 Key: THRIFT-4946
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4946
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.12.0
>         Environment:  * thrift latest master
>  * Operating Systems and Compilers:
>     * VS2017 x64
>     * VS2019 x64
>     * macOS 10.13
>     * Ubuntu 18.04 x86_64
>  * OpenSSL 1.1.1c (current latest official)
>            Reporter: Mario Emmenlauer
>            Priority: Major
>
> We observe a memory corruption in SecurityTest. The issue is not fully 
> reproducible: it appears on average in 1 out of 10 executions. However it is 
> not dependent on the environment because can reproduce the problem on Windows 
> VS2017 x64, VS2019 x64, macOS 10.13, and Ubuntu 18.04 x86_64.
> On Linux the issue is often reported as:
> {code}
> [...]
> TEST: Server = TLSv1_2, Client = TLSv1_1
> CLI 7f1be2eaa700 Exception: SSL_connect: tlsv1 alert protocol version 
> (SSL_error_code = 1)
> Thrift: Mon Sep  2 07:51:32 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> SRV 7f1be38bd700 Exception: SSL_accept: error code: 0 (SSL_error_code = 5) 
> error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
> Thrift: Mon Sep  2 07:51:32 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> double free or corruption (out)
> unknown location(0): fatal error: in "SecurityTest/ssl_security_matrix": 
> signal: SIGABRT (application abort requested)
> /builds/thrift/lib/cpp/test/SecurityTest.cpp(173): last checkpoint
> {code}
> But other forms also appear, for example:
> {code}
> [...]
> Thrift: Mon Sep  2 07:50:53 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> TEST: Server = TLSv1_2, Client = TLSv1_2
> corrupted size vs. prev_size
> {code}
> We tried to isolate a call stack for the problem but have failed so far. The 
> boost message log does not always point to the same protocol combination. We 
> executed the test in `valgrind` but it does never crash there. With `gdb` we 
> can create a stack trace but it does not mean much to me:
> {code}
> EST: Server = TLSv1_2, Client = TLSv1_0
> [New Thread 0x7f940fd05700 (LWP 1903)]
> [New Thread 0x7f9410718700 (LWP 1904)]
> CLI 7f9410718700 Exception: SSL_connect: tlsv1 alert protocol version 
> (SSL_error_code = 1)
> Thrift: Mon Sep  2 08:36:14 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> SRV 7f940fd05700 Exception: SSL_accept: error code: 0 (SSL_error_code = 5) 
> error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
> Thrift: Mon Sep  2 08:36:14 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> double free or corruption (out)
> [Thread 0x7f9410718700 (LWP 1904) exited]
> Thread 28 "SecurityTest" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7f940fd05700 (LWP 1903)]
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f9410b73801 in __GI_abort () at abort.c:79
> #2  0x00007f9410bbc897 in __libc_message (action=action@entry=do_abort, 
> fmt=fmt@entry=0x7f9410ce9b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
> #3  0x00007f9410bc390a in malloc_printerr (str=str@entry=0x7f9410ceb870 
> "double free or corruption (out)") at malloc.c:5350
> #4  0x00007f9410cceeb9 in _int_free (have_lock=0, p=0x7f940800cd70, 
> av=0x7f9410f1ec40 <main_arena>) at malloc.c:4278
> #5  __GI___libc_free (mem=0x7f940800cd80) at malloc.c:3124
> #6  tcache_thread_shutdown () at malloc.c:2969
> #7  arena_thread_freeres () at arena.c:950
> #8  0x00007f9410ccf652 in __libc_thread_freeres () at thread-freeres.c:29
> #9  0x00007f94121bb700 in start_thread (arg=0x7f940fd05700) at 
> pthread_create.c:476
> #10 0x00007f9410c5488f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> {code}
> This could indicate a multi-threading issue with the creation of server 
> and/or client in the test?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to