[ 
https://issues.apache.org/jira/browse/THRIFT-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933756#comment-16933756
 ] 

Mario Emmenlauer commented on THRIFT-4946:
------------------------------------------

Thanks for the quick reply!

??Wild guess without even looking at the code: Could it be connected to a race 
condition when SSL is used from multiple threads???

That would make sense in the context of the test. However I tried to fix all 
possible race conditions I could think of in the test, and to the best of my 
knowledge there should be none. There are mutexes in place that (according to 
my understanding) should make sure that the server is fully initialized before 
the client even gets a chance. Otherwise the test could have never worked 
reliably because the connection-attempt would have come before the server is 
ready.

Cutting a long story short, I'm at the end of my wit. Any help would be 
appreciated. Maybe its nothing but it seems worrying that basic thirft ssl 
tests fail in a majority of cases for us.

> Memory corruption in SecurityTest
> ---------------------------------
>
>                 Key: THRIFT-4946
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4946
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.12.0
>         Environment:  * thrift latest master
>  * Operating Systems and Compilers:
>     * VS2017 x64
>     * VS2019 x64
>     * macOS 10.13
>     * Ubuntu 18.04 x86_64
>  * OpenSSL 1.1.1c (current latest official)
>            Reporter: Mario Emmenlauer
>            Priority: Major
>
> We observe a memory corruption in SecurityTest. The issue is not fully 
> reproducible: it appears on average in 1 out of 10 executions. However it is 
> not dependent on the environment because can reproduce the problem on Windows 
> VS2017 x64, VS2019 x64, macOS 10.13, and Ubuntu 18.04 x86_64.
> On Linux the issue is often reported as:
> {code}
> [...]
> TEST: Server = TLSv1_2, Client = TLSv1_1
> CLI 7f1be2eaa700 Exception: SSL_connect: tlsv1 alert protocol version 
> (SSL_error_code = 1)
> Thrift: Mon Sep  2 07:51:32 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> SRV 7f1be38bd700 Exception: SSL_accept: error code: 0 (SSL_error_code = 5) 
> error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
> Thrift: Mon Sep  2 07:51:32 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> double free or corruption (out)
> unknown location(0): fatal error: in "SecurityTest/ssl_security_matrix": 
> signal: SIGABRT (application abort requested)
> /builds/thrift/lib/cpp/test/SecurityTest.cpp(173): last checkpoint
> {code}
> But other forms also appear, for example:
> {code}
> [...]
> Thrift: Mon Sep  2 07:50:53 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> TEST: Server = TLSv1_2, Client = TLSv1_2
> corrupted size vs. prev_size
> {code}
> We tried to isolate a call stack for the problem but have failed so far. The 
> boost message log does not always point to the same protocol combination. We 
> executed the test in `valgrind` but it does never crash there. With `gdb` we 
> can create a stack trace but it does not mean much to me:
> {code}
> EST: Server = TLSv1_2, Client = TLSv1_0
> [New Thread 0x7f940fd05700 (LWP 1903)]
> [New Thread 0x7f9410718700 (LWP 1904)]
> CLI 7f9410718700 Exception: SSL_connect: tlsv1 alert protocol version 
> (SSL_error_code = 1)
> Thrift: Mon Sep  2 08:36:14 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> SRV 7f940fd05700 Exception: SSL_accept: error code: 0 (SSL_error_code = 5) 
> error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version
> Thrift: Mon Sep  2 08:36:14 2019 SSL_shutdown: shutdown while in init 
> (SSL_error_code = 1)
> double free or corruption (out)
> [Thread 0x7f9410718700 (LWP 1904) exited]
> Thread 28 "SecurityTest" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7f940fd05700 (LWP 1903)]
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f9410b73801 in __GI_abort () at abort.c:79
> #2  0x00007f9410bbc897 in __libc_message (action=action@entry=do_abort, 
> fmt=fmt@entry=0x7f9410ce9b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
> #3  0x00007f9410bc390a in malloc_printerr (str=str@entry=0x7f9410ceb870 
> "double free or corruption (out)") at malloc.c:5350
> #4  0x00007f9410cceeb9 in _int_free (have_lock=0, p=0x7f940800cd70, 
> av=0x7f9410f1ec40 <main_arena>) at malloc.c:4278
> #5  __GI___libc_free (mem=0x7f940800cd80) at malloc.c:3124
> #6  tcache_thread_shutdown () at malloc.c:2969
> #7  arena_thread_freeres () at arena.c:950
> #8  0x00007f9410ccf652 in __libc_thread_freeres () at thread-freeres.c:29
> #9  0x00007f94121bb700 in start_thread (arg=0x7f940fd05700) at 
> pthread_create.c:476
> #10 0x00007f9410c5488f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> {code}
> This could indicate a multi-threading issue with the creation of server 
> and/or client in the test?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to