Hi,

Дилян had suggested to add some debug outputs to imap/tls.c

--------

diff --git a/imap/tls.c b/imap/tls.c
--- a/imap/tls.c
+++ b/imap/tls.c
@@ -893,7 +893,9 @@ EXPORTED int     tls_init_serverengine(const char *ident,

 #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
     /* Load DH params for DHE-* key exchanges */
+    syslog(LOG_CRIT, "dh_params will be set, current value=%p", dh_params);
     dh_params = load_dh_param(server_key_file, server_cert_file);
+    syslog(LOG_CRIT, "dh_params were set current_value=%p", dh_params);
     SSL_CTX_set_tmp_dh(s_ctx, dh_params);
 #endif

@@ -1308,7 +1310,11 @@ EXPORTED int tls_shutdown_serverengine(void)
         }

 #if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
-        if (dh_params) DH_free(dh_params);
+        if (dh_params) {
+               syslog(LOG_CRIT, "dh_params will be freed %p", dh_params);
+               DH_free(dh_params);
+               syslog(LOG_CRIT, "dh_params were freed %p", dh_params);
+       }
 #endif
     }
-----

I did run mupdate with this debug output


Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params will be set, current value=(nil) Jul 12 10:17:25 mx02 mu/mupdate[6537]: inittls: Loading DH parameters from file Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params were set current_value=0x7fc7541b9600 Jul 12 10:17:25 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication Jul 12 10:17:25 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost [10.23.21.78] cyrus PLAIN+TLS User logged in Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost [10.23.21.78] cyrus PLAIN+TLS User logged in
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params were freed 0x7fc7541b9600
Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost [10.23.21.78] cyrus PLAIN+TLS User logged in
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
Jul 12 10:18:38 mx02 mu/master[6534]: process type:SERVICE name:mupdate path:/usr/local/libexec/mupdate age:89.131s pid:6537 signaled to death by signal 11 (Segmentation fault, core dumped) Jul 12 10:18:38 mx02 mu/master[6534]: service mupdate/ipv4 pid 6537 in READY state: terminated abnormally

so it seems to me that the dh_params were set once on startup but freed for each closed connection

Regards

    Michael Menge


Quoting Michael Menge <michael.me...@zdv.uni-tuebingen.de>:

Hi Дилян,


Quoting Дилян Палаузов <dilyan.palau...@aegee.org>:

Hello Michael,

this is likely either a memory mishandling issue (use after free(),
double free(), invalid read()/write()...), which gets evident if cyrus
is run under valgrind --tool=memcheck.  I run it with

valgrind --num-callers=30 --leak-check=full --track-origins=yes --read-
var-info=yes --show-leak-kinds=all --trace-children=yes --track-fds=yes
/usr/local/cyrus/bin/master -D &> log-file-memcheck


thanks for the valgind options. Valgrind did find some "Invalid read of size 4" and "size 8" in DH_free as well as "Invalid write of size 4" in CRYPTO_add_lock and "size 8" in OPENSSL_cleanse. As far as i can tell the memory was always free'd
before by CRYPTO_free (mem.c:434).

I have attached the full log


Another reason can be multi-threaded inconsistencies: mutexes locked in
inconsistent order by differnt threads (while this is not a cause for
crash, it leads to deadlock), mutexes locked in one thread and unlocked
in another or alike.  This can be detected by valgrind/helgrind

valgrind --tool=helgrind --num-callers=30 --leak-check=full --track-
origins=yes --read-var-info=yes --trace-children=yes --track-fds=yes
/usr/local/cyrus/bin/master -D &> log-file-helgrind


I had to remove some options here, as my valgrind didn't know them with
--tool=helgrind

I used /usr/bin/valgrind --tool=helgrind --num-callers=30 --read-var-info=yes --trace-children=yes --track-fds=yes /usr/local/libexec/master -C /etc/imapd_mu.conf -M /etc/cyrus_mu.conf -p /var/run/cyurs_mu.pid -D &> /tmp/cyrus-mupdate-log-file-helgrind

Valgrind did find some "Possible data race during read" and "This conflicts with a previous write"
I have attached the full log as well



Of course, it is useful to have debugging symbols.  Compiling with -O3
migh make things faster, while compiling with -O0 will make it run
considerably slower. If ld (=ld.bfd) is very, very recent (e.g. 2.31),
then link with  -Wl,-z,noseparate-code otherwise valgrind cannot read
the debug info (https://bugs.kde.org/show_bug.cgi?id=395682).

Then it shall be possible to see in one or the other log files what is
relevant for causing SIGSERV, in particular the first place where bad
operation happens, which might be some time before the crash.


I hope this helps untangle the web of multiple threads, race conditions
and use after free, to find the root of this problem.

If there is any more that i can do to help let me know.


On Mon, 2018-07-02 at 17:24 +0200, Michael Menge wrote:
Hi,

we are in the process of setting up our new production mailserver with
cyrus-imapd 3.0.7 on RHEL 7.5 Servers.

At the moment we encounter many crashes (SIGSEGV) of the mupdate process on
the mupdate master instance. As soon as we issue a command that updates
multiple mailboxes in short time we trigger a SIGSEGV.

sam user/zrstes* cyrus all

Setting ACL on user/zrstes1...OK.
Setting ACL on user/zrstes1/Mail...OK.
Setting ACL on user/zrstes1/Mail/drafts...OK.
Setting ACL on user/zrstes1/Mail/s-spam...OK.
Setting ACL on user/zrstes1/Mail/sent...OK.
Setting ACL on user/zrstes1/Mail/trash...OK.
Setting ACL on user/zrstes1/Mail/v-spam...OK.
Setting ACL on user/zrstes2...cyrus: lrswipkxtea: no connection to server

I suspect that at this time the connection from the backend to the
mupdate master
is lost as the mupdate process received the SIGSEGV

(gdb) bt
#0  0x00007fb49c000098 in ?? ()
#1  0x00007fb4a9a59a85 in DH_free (r=0x7fb49c1b9600) at dh_lib.c:194
#2  0x00007fb4aa84dbef in tls_shutdown_serverengine () at imap/tls.c:1311
#3 0x0000000000404075 in conn_free (C=0x7fb4840009f0) at imap/mupdate.c:379
#4  0x00000000004062c0 in thread_main (rock=0x0) at imap/mupdate.c:1330
#5  0x00007fb4a771add5 in start_thread (arg=0x7fb4a22d9700) at
pthread_create.c:308
#6  0x00007fb4a718fb3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113


Further information, as DH_free is in the bt, we have NOT configured
dh_params in
our ssl certificate/key or in the imapd.conf


Kind regards

     Michael Menge



--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen



--------------------------------------------------------------------------------
M.Menge                                Tel.: (49) 7071/29-70316
Universität Tübingen                   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen

Reply via email to