[ 
https://issues.apache.org/jira/browse/KUDU-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012707#comment-18012707
 ] 

ASF subversion and git services commented on KUDU-3635:
-------------------------------------------------------

Commit d181955a338729b9310fd587a2b6374be5624340 in kudu's branch 
refs/heads/branch-1.18.x from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d181955a3 ]

KUDU-3635 call OPENSSL_cleanup() explicitly

With this changelist, the global state of the OpenSSL library is now
explicitly cleaned up when shutting down a process.  The rationale
for this is outlined in [1].

This is applicable to the code linked against OpenSSL library of
versions 1.1.1 and newer. IIUC, that covers all the contemporary Linux
distributions at the time of writing.  It worked for me at EOL Ubuntu
18.04 LTS as well (in particular, I tested it on Ubuntu 18.04.1 LTS).

As for testing, I verified that the issue is gone after observing its
manifestation with the frequency of about 1 in 10 runs of the kudu CLI
tool with RELEASE bits on RHEL8.8 and RHEL9.2 (both of x86_64 arch)
without the patch.  For reproduction and verification, I ran the
following 100rep test run multiple times (and 1000rep just for
verification):

  ./bin/kudu-tool-test --gtest_repeat=100

Without the patch, I saw many core files left by the kudu CLI binary
during every 100rep run, where many of the core files had stack traces
similar to the one described in the JIRA item.  With this patch,
no such core files were observed when running Kudu RELEASE bits.
However, there were still crashes with core files having stack traces
attributable to KUDU-2439.  That's addressed in a follow-up patch.

[1] 
https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl

Change-Id: Ib08310d66a7eabb1996bde901f39f36f54bff483
Reviewed-on: http://gerrit.cloudera.org:8080/23222
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
(cherry picked from commit bda99a38bb7da12959a99ffc9a79de02acdacf2a)
Reviewed-on: http://gerrit.cloudera.org:8080/23265
Reviewed-by: Alexey Serbin <[email protected]>


> kudu CLI tool sometimes crashes on exit with SIGSEGV in OPENSSL_cleanup
> -----------------------------------------------------------------------
>
>                 Key: KUDU-3635
>                 URL: https://issues.apache.org/jira/browse/KUDU-3635
>             Project: Kudu
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 1.17.0, 1.18.0, 1.17.1
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> The kudu CLI tools sometimes crash on exit with SIGSEGV.
> I haven't had a chance looking at this closely, but it seems the problem is 
> related to the order of cleanup of different libraries and overall unexpected 
> state of the runtime when the implicitly installed cleanup handler for the 
> OpenSSL library is being called.
> Below is a snippet from the output of the 
> {{RebalanceIgnoredTserversTest.Basic}} test scenario.  That was generated by 
> Kudu bits built in RELEASE configuration on Ubuntu 18.04.6 LTS machine and 
> run via dist-test on Ubuntu 18.04.6 LTS as well.
> BTW, we have been suppressing TSAN warnings in the OpenSSL cleanup paths for 
> a long time due to well-known issue in the OpenSSL library (see [this TSAN 
> suppression|https://github.com/apache/kudu/blob/2b9a2012f6d7b59931119dfad03e8d40e3031a0e/src/kudu/util/sanitizer_options.cc#L177-L184]),
>  so there might be some other issues around that we haven't paid attention 
> for a long time.
> Probably, it's time to follow [best practices for at-exit cleanup of 
> applications using 
> OpenSSL|https://developers.redhat.com/articles/2022/10/31/best-practices-application-shutdown-openssl#].
>   In essence, that works at least with v1.1.1 and newer versions of the 
> OpenSSL library: use the {{OPENSSL_INIT_NO_ATEXIT}} option for 
> {{OPENSSL_init_ssl()}} at initialization and then explicitly call 
> {{OPENSSL_cleanup()}} upon exit/shutdown.
> {noformat}
> *** SIGSEGV (@0x10000562bd5) received by PID 1447 (TID 0x7fb1cda47480) from 
> PID 5647317; stack trace: ***
>     @     0x7fb1d6307980 (unknown) at ??:0                                    
>   
>     @     0x7fb1d5a37873 tcmalloc::ThreadCache::ReleaseToCentralCache() at 
> ??:0 
>     @     0x7fb1d5a37be7 tcmalloc::ThreadCache::Scavenge() at ??:0            
>   
>     @     0x7fb1d3bce271 OPENSSL_LH_free at ??:0                              
>   
>     @     0x7fb1d3bacbfd (unknown) at ??:0                                    
>   
>     @     0x7fb1d3bcbe10 OPENSSL_cleanup at ??:0                              
>   
>     @     0x7fb1d434e161 (unknown) at ??:0                                    
>   
>     @     0x7fb1d434e25a exit at ??:0                                         
>   
>     @     0x7fb1d432cbfe __libc_start_main at ??:0                            
>   
>     @     0x562bc9f8300a _start at ??:0   
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to