Re: index corruption weirdness

2018-10-10 Thread Kelsey Cummings
On 10/10/18 7:26 AM, Aki Tuomi wrote:
>> Are you saying that there is a bug in this version that affects RHEL 7.5
>> but not RHEL 6 or just use the newest version and maybe the problem goes
>> away?
> 
> We have very limited interest in figuring out problems with (very) old
> dovecot versions. At minimum you need to show this problem with 2.2.36
> or 2.3.2.1.
> 
> A thing you should make sure is that you are not accessing the user with
> two different servers concurrently.

The directors appear to be working fine so, no, users aren't hitting
multiple back end servers.

To be clear, we don't suspect Dovecot as much - our deployment had been
stable for years - but rather behavior changes between the RHEL6 and
RHLE7 environment, particularly with regards to NFSv3.  But we've have
been at a loss to find a smoking gun.

For various reasons achieving stability (again) on the current version
is very important while we continue to plan Dovecot and storage backend
upgrades.  Corruption leading to crashes is very infrequent percentage
wise but it's enough to negatively impact performance and impact users
-- out of 5+ million sessions/day we're seeing ~5 instances whereas on 6
it would have been one every few months.

Has anyone else experienced any NFS/locking issues transitioning from
RHEL6 to 7 with Netapp storage?  Grasping at straws - perhaps compiler
and/or system library issues interacting with Dovecot?

-K


Re: index corruption weirdness

2018-10-10 Thread Reio Remma

On 10.10.2018 19:12, William Taylor wrote:

OS Info:

CentOS Linux release 7.5.1804 (Core)
3.10.0-862.14.4.el7.x86_64

NFS:
# mount -t nfs |grep mail/15
172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs
(rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)

Dovecot Info:
dovecot -n
# 2.1.17: /etc/dovecot/dovecot.conf


Hi!

Thank you for your report, however, 2.1.17 is VERY old version of
dovecot and this problem is very likely fixed in a more recent version.

Aki


I realize it is an older release.

Are you saying that there is a bug in this version that affects RHEL 7.5
but not RHEL 6 or just use the newest version and maybe the problem goes
away?


I can see from my CentOS 7 installation that it comes with 2.2.10-8.el7 
package. Did you install 2.1.17 specifically somehow?


I'm using dovecot 2.3.3 as packaged by the developers in CentOS 7 myself.

Good luck,
Reio


Re: index corruption weirdness

2018-10-10 Thread Aki Tuomi


 
 
  
   
  
  
   
On 10 October 2018 at 19:12 William Taylor <
william.tay...@sonic.com> wrote:
   
   

   
   

   
   
On Wed, Oct 10, 2018 at 09:37:46AM +0300, Aki Tuomi wrote:
   
   

 


 On 09.10.2018 22:16, William Taylor wrote:


 
  We have started seeing index corruption ever since we upgraded (we
 
 
  believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored
 
 
  on Netapps mounted via NFS. We have 2 lvs servers running surealived in
 
 
  dr/wlc, 2 directors and 6 backend imap/pop servers.
 


 
  Most of the core dumps I've looked at for different users are like
 
 
  "Backtrace 2" with some variations on folder path.
 


 
  This latest crash (Backtrace 1) is different from others I've seen.
 
 
  It is also leaving 0byte files in the users .Drafts/tmp folder.
 


 
  # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'
 
 
  |sort | uniq -c
 
 
  9692 0
 
 
  1 218600
 


 
  I believe the number of cores here is different from the number of tmp
 
 
  files because this is when we moved the user to our debug server so we
 
 
  could get the core dumps.
 
 
  # ls -la /home/u/user1/core.* |wc -l
 
 
  8437
 


 
  Any help/insight would be greatly appreciated.
 


 
  Thanks,
 
 
  William
 


 >


 
  OS Info:
 
 
  CentOS Linux release 7.5.1804 (Core)
 
 
  3.10.0-862.14.4.el7.x86_64
 


 
  NFS:
 
 
  # mount -t nfs |grep mail/15
 
 
  172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs
 
 
  (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
 


 
  Dovecot Info:
 
 
  dovecot -n
 
 
  # 2.1.17: /etc/dovecot/dovecot.conf
 


 


 Hi!


 


 Thank you for your report, however, 2.1.17 is VERY old version of


 dovecot and this problem is very likely fixed in a more recent version.


 


 Aki


 

   
   
I realize it is an older release.
   
   

   
   
Are you saying that there is a bug in this version that affects RHEL 7.5
   
   
but not RHEL 6 or just use the newest version and maybe the problem goes
   
   
away?
   
  
  
   
  
  
   We have very limited interest in figuring out problems with (very) old dovecot versions. At minimum you need to show this problem with 2.2.36 or 2.3.2.1.
  
  
   
  
  
   A thing you should make sure is that you are not accessing the user with two different servers concurrently.
  
  
   ---
   Aki Tuomi
   
 



Re: index corruption weirdness

2018-10-10 Thread William Taylor
On Wed, Oct 10, 2018 at 09:37:46AM +0300, Aki Tuomi wrote:
> 
> 
> On 09.10.2018 22:16, William Taylor wrote:
> > We have started seeing index corruption ever since we upgraded (we 
> > believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored 
> > on Netapps mounted via NFS. We have 2 lvs servers running surealived in 
> > dr/wlc, 2 directors and 6 backend imap/pop servers.
> >
> > Most of the core dumps I've looked at for different users are like 
> > "Backtrace 2" with some variations on folder path.
> >
> > This latest crash (Backtrace 1) is different from others I've seen.
> > It is also leaving 0byte files in the users .Drafts/tmp folder.
> >
> > # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'  
> >   |sort | uniq -c
> >9692 0
> >   1 218600
> >
> > I believe the number of cores here is different from the number of tmp 
> > files because this is when we moved the user to our debug server so we
> > could get the core dumps.
> > # ls -la /home/u/user1/core.* |wc -l   
> >   8437
> >
> > Any help/insight would be greatly appreciated.
> >
> > Thanks,
> >   William
> >
> >
> > OS Info:
> > CentOS Linux release 7.5.1804 (Core)
> > 3.10.0-862.14.4.el7.x86_64
> >
> > NFS:
> > # mount -t nfs |grep mail/15
> > 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs 
> > (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
> >
> > Dovecot Info:
> > dovecot -n
> > # 2.1.17: /etc/dovecot/dovecot.conf
> >
> 
> Hi!
> 
> Thank you for your report, however, 2.1.17 is VERY old version of
> dovecot and this problem is very likely fixed in a more recent version.
> 
> Aki
> 

I realize it is an older release.

Are you saying that there is a bug in this version that affects RHEL 7.5 
but not RHEL 6 or just use the newest version and maybe the problem goes 
away?


Re: index corruption weirdness

2018-10-10 Thread Aki Tuomi



On 09.10.2018 22:16, William Taylor wrote:
> We have started seeing index corruption ever since we upgraded (we 
> believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored 
> on Netapps mounted via NFS. We have 2 lvs servers running surealived in 
> dr/wlc, 2 directors and 6 backend imap/pop servers.
>
> Most of the core dumps I've looked at for different users are like 
> "Backtrace 2" with some variations on folder path.
>
> This latest crash (Backtrace 1) is different from others I've seen.
> It is also leaving 0byte files in the users .Drafts/tmp folder.
>
> # ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'  
>   |sort | uniq -c
>9692 0
>   1 218600
>
> I believe the number of cores here is different from the number of tmp 
> files because this is when we moved the user to our debug server so we
> could get the core dumps.
> # ls -la /home/u/user1/core.* |wc -l   
>   8437
>
> Any help/insight would be greatly appreciated.
>
> Thanks,
>   William
>
>
> OS Info:
> CentOS Linux release 7.5.1804 (Core)
> 3.10.0-862.14.4.el7.x86_64
>
> NFS:
> # mount -t nfs |grep mail/15
> 172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs 
> (rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)
>
> Dovecot Info:
> dovecot -n
> # 2.1.17: /etc/dovecot/dovecot.conf
>

Hi!

Thank you for your report, however, 2.1.17 is VERY old version of
dovecot and this problem is very likely fixed in a more recent version.

Aki


index corruption weirdness

2018-10-09 Thread William Taylor
We have started seeing index corruption ever since we upgraded (we 
believe) our imap servers from SL6 to Centos 7. Mail/Indexes are stored 
on Netapps mounted via NFS. We have 2 lvs servers running surealived in 
dr/wlc, 2 directors and 6 backend imap/pop servers.

Most of the core dumps I've looked at for different users are like 
"Backtrace 2" with some variations on folder path.

This latest crash (Backtrace 1) is different from others I've seen.
It is also leaving 0byte files in the users .Drafts/tmp folder.

# ls -s /var/spool/mail/15/00/user1/.Drafts/tmp | awk '{print $1}'  
  |sort | uniq -c
   9692 0
  1 218600

I believe the number of cores here is different from the number of tmp 
files because this is when we moved the user to our debug server so we
could get the core dumps.
# ls -la /home/u/user1/core.* |wc -l   
  8437

Any help/insight would be greatly appreciated.

Thanks,
  William


OS Info:
CentOS Linux release 7.5.1804 (Core)
3.10.0-862.14.4.el7.x86_64

NFS:
# mount -t nfs |grep mail/15
172.16.255.14:/vol/vol1/mail/15 on /var/spool/mail/15 type nfs 
(rw,nosuid,nodev,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nordirplus,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.255.14,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=172.16.255.14)

Dovecot Info:
dovecot -n
# 2.1.17: /etc/dovecot/dovecot.conf
# OS: Linux 3.10.0-862.14.4.el7.x86_64 x86_64 CentOS Linux release 
  7.5.1804 (Core)  auth_failure_delay = 0
auth_master_user_separator = *
auth_username_format = %Ln
auth_verbose = yes
auth_verbose_passwords = sha1
auth_worker_max_count = 64
login_log_format_elements = user=<%u> session=%{session} method=%m 
rip=%r lip=%l mpid=%e %c
login_trusted_networks = 172.16.0/24
mail_debug = yes
mail_fsync = always
mail_log_prefix = "%s(%u): session=%{session} "
mail_plugins = zlib
maildir_very_dirty_syncs = yes
mmap_disable = yes
passdb {
  args = /etc/dovecot/master-users
  driver = passwd-file
  master = yes
}
passdb {
  args = imap
  driver = pam
}
plugin {
  lazy_expunge = DELETED_MESSAGES.
  mail_log_events = delete expunge flag_change
  mail_log_fields = uid box msgid from flags size
  quota = fs:User quota
  stats_refresh = 30 secs
  stats_track_cmds = yes
}
protocols = imap pop3
service anvil {
  client_limit = 1
}
service auth {
  client_limit = 1
  vsz_limit = 1 G
}
service doveadm {
  inet_listener {
port = 1842
  }
  unix_listener doveadm-server {
mode = 0666
  }
}
service imap-login {
  inet_listener imap {
port = 143
  }
  inet_listener imaps {
port = 993
ssl = yes
  }
  process_limit = 7000
  process_min_avail = 32
  vsz_limit = 256 M
}
service imap-postlogin {
  executable = script-login -d /etc/dovecot/bin/foo-imap-postlogin
  user = $default_internal_user
}
service imap {
  executable = imap imap-postlogin
  process_limit = 7000
  vsz_limit = 1492 M
}
service pop3-login {
  inet_listener pop3 {
port = 110
  }
  inet_listener pop3s {
port = 995
ssl = yes
  }
  process_limit = 2000
  process_min_avail = 32
  vsz_limit = 256 M
}
service pop3-postlogin {
  executable = script-login -d /etc/dovecot/bin/foo-pop3-postlogin
  user = $default_internal_user
}
service pop3 {
  executable = pop3 pop3-postlogin
  process_limit = 2000
}
shutdown_clients = no
ssl = required
ssl_ca = , 31683224, 
140725444114256, 31908720, 140457858143945, 31683224}}, sa_flags = 
-457466710, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x7fbee4bbdb65 in default_fatal_finish (type=, 
status=status@entry=0) at failures.c:191
backtrace = 0x1e372d0 
"/usr/lib64/dovecot/libdovecot.so.0(+0x46b55) [0x7fbee4bbdb55] -> 
/usr/lib64/dovecot/libdovecot.so.0(+0x46c1e) [0x7fbee4bbdc1e] -> 
/usr/lib64/dovecot/libdovecot.so.0(i_fatal+0) [0x7fbee4b90dda] -> 
/usr"...
#3  0x7fbee4bbdc1e in i_internal_fatal_handler (ctx=0x7ffd321b77a0, 
format=, args=) at failures.c:649
status = 0
#4  0x7fbee4b90dda in i_panic (format=format@entry=0x7fbee4ee0588 
"file %s: line %d (%s): assertion failed: (%s)") at failures.c:263
ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0}
args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 
0x7ffd321b7890, reg_save_area = 0x7ffd321b77d0}}
#5  0x7fbee4ea96ad in index_mail_parse_body_finish (mail=0x1e69570, 
field=field@entry=MAIL_CACHE_FLAGS) at index-mail.c:769
parser_input = 0x1e6e370
ret = 1
__FUNCTION__ = "index_mail_parse_body_finish"
#6  0x7fbee4eaaa2b in index_mail_cache_parse_deinit 
(_mail=, received_date=, 
success=)
at index-mail.c:1624
mail = 
#7  0x7fbee4e5dce3 in maildir_save_finish_real (_ctx=0x1e68560) at 
maildir-save.c:551
ctx = 0x1e68560
e = 0x1e574d0
output_errno = 
path = 0x1e37218 
"/var/spool/mail/15/00/user1/.Drafts/tmp/1539107164.M157986P9449.debug.imapd.foo.com"
real_size = 
size = 1539107194
#8