Re: index corruption weirdness
On 10/10/18 7:26 AM, Aki Tuomi wrote: >> Are you saying that there is a bug in this version that affects RHEL 7.5 >> but not RHEL 6 or just use the newest version and maybe the problem goes >> away? > > We have very limited interest in figuring out problems with (very) old > dovecot versions. At minimum you need to show this problem with 2.2.36 > or 2.3.2.1. > > A thing you should make sure is that you are not accessing the user with > two different servers concurrently. The directors appear to be working fine so, no, users aren't hitting multiple back end servers. To be clear, we don't suspect Dovecot as much - our deployment had been stable for years - but rather behavior changes between the RHEL6 and RHLE7 environment, particularly with regards to NFSv3. But we've have been at a loss to find a smoking gun. For various reasons achieving stability (again) on the current version is very important while we continue to plan Dovecot and storage backend upgrades. Corruption leading to crashes is very infrequent percentage wise but it's enough to negatively impact performance and impact users -- out of 5+ million sessions/day we're seeing ~5 instances whereas on 6 it would have been one every few months. Has anyone else experienced any NFS/locking issues transitioning from RHEL6 to 7 with Netapp storage? Grasping at straws - perhaps compiler and/or system library issues interacting with Dovecot? -K
Re: index corruption weirdness
On 10.10.2018 19:12, William Taylor wrote: OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64 NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14) Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf Hi! Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version. Aki I realize it is an older release. Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away? I can see from my CentOS 7 installation that it comes with 2.2.10-8.el7 package. Did you install 2.1.17 specifically somehow? I'm using dovecot 2.3.3 as packaged by the developers in CentOS 7 myself. Good luck, Reio
Re: index corruption weirdness
On 10 October 2018 at 19:12 William Taylor < william.tay...@sonic.com> wrote: On Wed, Oct 10, 2018 at 09:37:46AM +0300, Aki Tuomi wrote: On 09.10.2018 22:16, William Taylor wrote: We have started seeing index corruption ever since we upgraded (we believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored on Netapps mounted via NFS. We have 2 lvs servers running surealived in dr/wlc, 2 directors and 6 backend imap/pop servers. Most of the core dumps I've looked at for different users are like "Backtrace 2" with some variations on folder path. This latest crash (Backtrace 1) is different from others I've seen. It is also leaving 0byte files in the users .Drafts/tmp folder. # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}' |sort | uniq -c 9692 0 1 218600 I believe the number of cores here is different from the number of tmp files because this is when we moved the user to our debug server so we could get the core dumps. # ls -la /home/u/user1/core.* |wc -l 8437 Any help/insight would be greatly appreciated. Thanks, William > OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64 NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14) Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf Hi! Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version. Aki I realize it is an older release. Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away? We have very limited interest in figuring out problems with (very) old dovecot versions. At minimum you need to show this problem with 2.2.36 or 2.3.2.1. A thing you should make sure is that you are not accessing the user with two different servers concurrently. --- Aki Tuomi
Re: index corruption weirdness
On Wed, Oct 10, 2018 at 09:37:46AM +0300, Aki Tuomi wrote: > > > On 09.10.2018 22:16, William Taylor wrote: > > We have started seeing index corruption ever since we upgraded (we > > believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored > > on Netapps mounted via NFS. We have 2 lvs servers running surealived in > > dr/wlc, 2 directors and 6 backend imap/pop servers. > > > > Most of the core dumps I've looked at for different users are like > > "Backtrace 2" with some variations on folder path. > > > > This latest crash (Backtrace 1) is different from others I've seen. > > It is also leaving 0byte files in the users .Drafts/tmp folder. > > > > # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}' > > |sort | uniq -c > >9692 0 > > 1 218600 > > > > I believe the number of cores here is different from the number of tmp > > files because this is when we moved the user to our debug server so we > > could get the core dumps. > > # ls -la /home/u/user1/core.* |wc -l > > 8437 > > > > Any help/insight would be greatly appreciated. > > > > Thanks, > > William > > > > > > OS Info: > > CentOS Linux release 7.5.1804 (Core) > > 3.10.0-862.14.4.el7.x86_64 > > > > NFS: > > # mount -t nfs |grep mail/15 > > 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs > > (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14) > > > > Dovecot Info: > > dovecot -n > > # 2.1.17: /etc/dovecot/dovecot.conf > > > > Hi! > > Thank you for your report, however, 2.1.17 is VERY old version of > dovecot and this problem is very likely fixed in a more recent version. > > Aki > I realize it is an older release. Are you saying that there is a bug in this version that affects RHEL 7.5 but not RHEL 6 or just use the newest version and maybe the problem goes away?
Re: index corruption weirdness
On 09.10.2018 22:16, William Taylor wrote: > We have started seeing index corruption ever since we upgraded (we > believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored > on Netapps mounted via NFS. We have 2 lvs servers running surealived in > dr/wlc, 2 directors and 6 backend imap/pop servers. > > Most of the core dumps I've looked at for different users are like > "Backtrace 2" with some variations on folder path. > > This latest crash (Backtrace 1) is different from others I've seen. > It is also leaving 0byte files in the users .Drafts/tmp folder. > > # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}' > |sort | uniq -c >9692 0 > 1 218600 > > I believe the number of cores here is different from the number of tmp > files because this is when we moved the user to our debug server so we > could get the core dumps. > # ls -la /home/u/user1/core.* |wc -l > 8437 > > Any help/insight would be greatly appreciated. > > Thanks, > William > > > OS Info: > CentOS Linux release 7.5.1804 (Core) > 3.10.0-862.14.4.el7.x86_64 > > NFS: > # mount -t nfs |grep mail/15 > 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs > (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14) > > Dovecot Info: > dovecot -n > # 2.1.17: /etc/dovecot/dovecot.conf > Hi! Thank you for your report, however, 2.1.17 is VERY old version of dovecot and this problem is very likely fixed in a more recent version. Aki
index corruption weirdness
We have started seeing index corruption ever since we upgraded (we believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored on Netapps mounted via NFS. We have 2 lvs servers running surealived in dr/wlc, 2 directors and 6 backend imap/pop servers. Most of the core dumps I've looked at for different users are like "Backtrace 2" with some variations on folder path. This latest crash (Backtrace 1) is different from others I've seen. It is also leaving 0byte files in the users .Drafts/tmp folder. # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}' |sort | uniq -c 9692 0 1 218600 I believe the number of cores here is different from the number of tmp files because this is when we moved the user to our debug server so we could get the core dumps. # ls -la /home/u/user1/core.* |wc -l 8437 Any help/insight would be greatly appreciated. Thanks, William OS Info: CentOS Linux release 7.5.1804 (Core) 3.10.0-862.14.4.el7.x86_64 NFS: # mount -t nfs |grep mail/15 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14) Dovecot Info: dovecot -n # 2.1.17: /etc/dovecot/dovecot.conf # OS: Linux 3.10.0-862.14.4.el7.x86_64 x86_64 CentOS Linux release 7.5.1804 (Core) auth_failure_delay = 0 auth_master_user_separator = * auth_username_format = %Ln auth_verbose = yes auth_verbose_passwords = sha1 auth_worker_max_count = 64 login_log_format_elements = user=<%u> session=%{session} method=%m rip=%r lip=%l mpid=%e %c login_trusted_networks = 172.16.0/24 mail_debug = yes mail_fsync = always mail_log_prefix = "%s(%u): session=%{session} " mail_plugins = zlib maildir_very_dirty_syncs = yes mmap_disable = yes passdb { args = /etc/dovecot/master-users driver = passwd-file master = yes } passdb { args = imap driver = pam } plugin { lazy_expunge = DELETED_MESSAGES. mail_log_events = delete expunge flag_change mail_log_fields = uid box msgid from flags size quota = fs:User quota stats_refresh = 30 secs stats_track_cmds = yes } protocols = imap pop3 service anvil { client_limit = 1 } service auth { client_limit = 1 vsz_limit = 1 G } service doveadm { inet_listener { port = 1842 } unix_listener doveadm-server { mode = 0666 } } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 7000 process_min_avail = 32 vsz_limit = 256 M } service imap-postlogin { executable = script-login -d /etc/dovecot/bin/foo-imap-postlogin user = $default_internal_user } service imap { executable = imap imap-postlogin process_limit = 7000 vsz_limit = 1492 M } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } process_limit = 2000 process_min_avail = 32 vsz_limit = 256 M } service pop3-postlogin { executable = script-login -d /etc/dovecot/bin/foo-pop3-postlogin user = $default_internal_user } service pop3 { executable = pop3 pop3-postlogin process_limit = 2000 } shutdown_clients = no ssl = required ssl_ca = , 31683224, 140725444114256, 31908720, 140457858143945, 31683224}}, sa_flags = -457466710, sa_restorer = 0x0} sigs = {__val = {32, 0 }} #2 0x7fbee4bbdb65 in default_fatal_finish (type=, status=status@entry=0) at failures.c:191 backtrace = 0x1e372d0 "/usr/lib64/dovecot/libdovecot.so.0(+0x46b55) [0x7fbee4bbdb55] -> /usr/lib64/dovecot/libdovecot.so.0(+0x46c1e) [0x7fbee4bbdc1e] -> /usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7fbee4b90dda] -> /usr"... #3 0x7fbee4bbdc1e in i_internal_fatal_handler (ctx=0x7ffd321b77a0, format=, args=) at failures.c:649 status = 0 #4 0x7fbee4b90dda in i_panic (format=format@entry=0x7fbee4ee0588 "file %s: line %d (%s): assertion failed: (%s)") at failures.c:263 ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0} args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7ffd321b7890, reg_save_area = 0x7ffd321b77d0}} #5 0x7fbee4ea96ad in index_mail_parse_body_finish (mail=0x1e69570, field=field@entry=MAIL_CACHE_FLAGS) at index-mail.c:769 parser_input = 0x1e6e370 ret = 1 __FUNCTION__ = "index_mail_parse_body_finish" #6 0x7fbee4eaaa2b in index_mail_cache_parse_deinit (_mail=, received_date=, success=) at index-mail.c:1624 mail = #7 0x7fbee4e5dce3 in maildir_save_finish_real (_ctx=0x1e68560) at maildir-save.c:551 ctx = 0x1e68560 e = 0x1e574d0 output_errno = path = 0x1e37218 "/var/spool/mail/15/00/user1/.Drafts/tmp/1539107164.M157986P9449.debug.imapd.foo.com" real_size = size = 1539107194 #8