Hi,
Дилян had suggested to add some debug outputs to imap/tls.c
--------
diff --git a/imap/tls.c b/imap/tls.c
--- a/imap/tls.c
+++ b/imap/tls.c
@@ -893,7 +893,9 @@ EXPORTED int tls_init_serverengine(const char *ident,
#if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
/* Load DH params for DHE-* key exchanges */
+ syslog(LOG_CRIT, "dh_params will be set, current value=%p", dh_params);
dh_params = load_dh_param(server_key_file, server_cert_file);
+ syslog(LOG_CRIT, "dh_params were set current_value=%p", dh_params);
SSL_CTX_set_tmp_dh(s_ctx, dh_params);
#endif
@@ -1308,7 +1310,11 @@ EXPORTED int tls_shutdown_serverengine(void)
}
#if (OPENSSL_VERSION_NUMBER >= 0x0090800fL)
- if (dh_params) DH_free(dh_params);
+ if (dh_params) {
+ syslog(LOG_CRIT, "dh_params will be freed %p", dh_params);
+ DH_free(dh_params);
+ syslog(LOG_CRIT, "dh_params were freed %p", dh_params);
+ }
#endif
}
-----
I did run mupdate with this debug output
Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params will be set, current
value=(nil)
Jul 12 10:17:25 mx02 mu/mupdate[6537]: inittls: Loading DH parameters
from file
Jul 12 10:17:25 mx02 mu/mupdate[6537]: dh_params were set
current_value=0x7fc7541b9600
Jul 12 10:17:25 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher
ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
Jul 12 10:17:25 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost
[10.23.21.78] cyrus PLAIN+TLS User logged in
Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher
ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost
[10.23.21.78] cyrus PLAIN+TLS User logged in
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params were freed 0x7fc7541b9600
Jul 12 10:18:37 mx02 mu/mupdate[6537]: starttls: TLSv1.2 with cipher
ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits new) no authentication
Jul 12 10:18:37 mx02 mu/mupdate[6537]: login: msmx02.mail.localhost
[10.23.21.78] cyrus PLAIN+TLS User logged in
Jul 12 10:18:37 mx02 mu/mupdate[6537]: dh_params will be freed 0x7fc7541b9600
Jul 12 10:18:38 mx02 mu/master[6534]: process type:SERVICE
name:mupdate path:/usr/local/libexec/mupdate age:89.131s pid:6537
signaled to death by signal 11 (Segmentation fault, core dumped)
Jul 12 10:18:38 mx02 mu/master[6534]: service mupdate/ipv4 pid 6537 in
READY state: terminated abnormally
so it seems to me that the dh_params were set once on startup but
freed for each closed connection
Regards
Michael Menge
Quoting Michael Menge <michael.me...@zdv.uni-tuebingen.de>:
Hi Дилян,
Quoting Дилян Палаузов <dilyan.palau...@aegee.org>:
Hello Michael,
this is likely either a memory mishandling issue (use after free(),
double free(), invalid read()/write()...), which gets evident if cyrus
is run under valgrind --tool=memcheck. I run it with
valgrind --num-callers=30 --leak-check=full --track-origins=yes --read-
var-info=yes --show-leak-kinds=all --trace-children=yes --track-fds=yes
/usr/local/cyrus/bin/master -D &> log-file-memcheck
thanks for the valgind options. Valgrind did find some "Invalid read
of size 4"
and "size 8" in DH_free as well as "Invalid write of size 4" in
CRYPTO_add_lock
and "size 8" in OPENSSL_cleanse. As far as i can tell the memory was
always free'd
before by CRYPTO_free (mem.c:434).
I have attached the full log
Another reason can be multi-threaded inconsistencies: mutexes locked in
inconsistent order by differnt threads (while this is not a cause for
crash, it leads to deadlock), mutexes locked in one thread and unlocked
in another or alike. This can be detected by valgrind/helgrind
valgrind --tool=helgrind --num-callers=30 --leak-check=full --track-
origins=yes --read-var-info=yes --trace-children=yes --track-fds=yes
/usr/local/cyrus/bin/master -D &> log-file-helgrind
I had to remove some options here, as my valgrind didn't know them with
--tool=helgrind
I used /usr/bin/valgrind --tool=helgrind --num-callers=30
--read-var-info=yes --trace-children=yes --track-fds=yes
/usr/local/libexec/master -C /etc/imapd_mu.conf -M
/etc/cyrus_mu.conf -p /var/run/cyurs_mu.pid -D &>
/tmp/cyrus-mupdate-log-file-helgrind
Valgrind did find some "Possible data race during read" and "This
conflicts with a previous write"
I have attached the full log as well
Of course, it is useful to have debugging symbols. Compiling with -O3
migh make things faster, while compiling with -O0 will make it run
considerably slower. If ld (=ld.bfd) is very, very recent (e.g. 2.31),
then link with -Wl,-z,noseparate-code otherwise valgrind cannot read
the debug info (https://bugs.kde.org/show_bug.cgi?id=395682).
Then it shall be possible to see in one or the other log files what is
relevant for causing SIGSERV, in particular the first place where bad
operation happens, which might be some time before the crash.
I hope this helps untangle the web of multiple threads, race conditions
and use after free, to find the root of this problem.
If there is any more that i can do to help let me know.
On Mon, 2018-07-02 at 17:24 +0200, Michael Menge wrote:
Hi,
we are in the process of setting up our new production mailserver with
cyrus-imapd 3.0.7 on RHEL 7.5 Servers.
At the moment we encounter many crashes (SIGSEGV) of the mupdate process on
the mupdate master instance. As soon as we issue a command that updates
multiple mailboxes in short time we trigger a SIGSEGV.
sam user/zrstes* cyrus all
Setting ACL on user/zrstes1...OK.
Setting ACL on user/zrstes1/Mail...OK.
Setting ACL on user/zrstes1/Mail/drafts...OK.
Setting ACL on user/zrstes1/Mail/s-spam...OK.
Setting ACL on user/zrstes1/Mail/sent...OK.
Setting ACL on user/zrstes1/Mail/trash...OK.
Setting ACL on user/zrstes1/Mail/v-spam...OK.
Setting ACL on user/zrstes2...cyrus: lrswipkxtea: no connection to server
I suspect that at this time the connection from the backend to the
mupdate master
is lost as the mupdate process received the SIGSEGV
(gdb) bt
#0 0x00007fb49c000098 in ?? ()
#1 0x00007fb4a9a59a85 in DH_free (r=0x7fb49c1b9600) at dh_lib.c:194
#2 0x00007fb4aa84dbef in tls_shutdown_serverengine () at imap/tls.c:1311
#3 0x0000000000404075 in conn_free (C=0x7fb4840009f0) at
imap/mupdate.c:379
#4 0x00000000004062c0 in thread_main (rock=0x0) at imap/mupdate.c:1330
#5 0x00007fb4a771add5 in start_thread (arg=0x7fb4a22d9700) at
pthread_create.c:308
#6 0x00007fb4a718fb3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Further information, as DH_free is in the bt, we have NOT configured
dh_params in
our ssl certificate/key or in the imapd.conf
Kind regards
Michael Menge
--------------------------------------------------------------------------------
M.Menge Tel.: (49) 7071/29-70316
Universität Tübingen Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail:
michael.me...@zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen
--------------------------------------------------------------------------------
M.Menge Tel.: (49) 7071/29-70316
Universität Tübingen Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung mail:
michael.me...@zdv.uni-tuebingen.de
Wächterstraße 76
72074 Tübingen