Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-24 Thread Stan Hoeppner
On 12/23/2011 11:08 AM, hydra wrote:
 Selinux wasn't running.
 
 I can reproduce the problem with this setup:
 
 - restore the maildir with the broken index.cache
 - run `/usr/bin/doveadm -D expunge -A mailbox Trash savedbefore 1w` in an
 infinite loop
 - use horde webmail to access the mail with IMAP
 - randomly display messages
 
 I could reproduce the problem on:
 Linux-2.6.37-hardened-r7
 Linux-2.6.39-hardened-r8
 Linux-3.0.4-hardened-r5
 Linux-3.0.6-gentoo
 Linux-3.1.4-gentoo
 
 However I failed to reproduce the problem on:
 Linux-2.6.36-hardened-r9
 
 So it seems it was introduced in 2.6.37. I'm going to inspect it later on.
 By wishing you a God blessed merry Christmas I thank you for the support ;)

Aha, cool.  Glad you're making some progress, and that you now know it's
kernel related in some way, just as Timo suspected.  Keep in mind it may
not necessarily be a code change problem in 2.6.37 onward.  Could be a
compiler/linker issue.  It could also be the default EXT4 journal
behavior changed (I haven't checked), or something similar.

Merry Christmas to you as well hydra.  I don't really deserve much
credit for support here.  I've just nudged you in the direction you
already knew you should be heading. :)

Best regards,

-- 
Stan



Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-23 Thread hydra
Yes, it is reproducible.

About the file, I have tarred up the original maildir before I deleted the
dovecot.index.cache, so when I try to bring in the original file, a 'cat'
does no harm. Moreover, I've tried to do this will all files on the virtual
machine, same results - no harm.


On Thu, Dec 22, 2011 at 5:13 PM, Timo Sirainen t...@iki.fi wrote:

 On 22.12.2011, at 18.06, hydra wrote:

  After the virtual machine reboot, the CPU usage is normal again, but just
  until doveadm is launched again (it was run from cron). Sorry for the
  partition/drive terminology mess up.

 So it's reproducible? You could see if you can reproduce it by simply
 reading the file(s) that are causing problems. Like with zsh: cat **/* 
 /dev/null




Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-23 Thread Stan Hoeppner
On 12/23/2011 4:12 AM, hydra wrote:
 Yes, it is reproducible.
 
 About the file, I have tarred up the original maildir before I deleted the
 dovecot.index.cache, so when I try to bring in the original file, a 'cat'
 does no harm. Moreover, I've tried to do this will all files on the virtual
 machine, same results - no harm.

Is selinux running?  If so, turn it off and see if that helps.  And turn
off anything/everything else on this Linux VM that's not running on the
other Dovecot VMs you mentioned are working properly.  Let's do some
process of elimination here.

-- 
Stan



Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-23 Thread hydra
Selinux wasn't running.

I can reproduce the problem with this setup:

- restore the maildir with the broken index.cache
- run `/usr/bin/doveadm -D expunge -A mailbox Trash savedbefore 1w` in an
infinite loop
- use horde webmail to access the mail with IMAP
- randomly display messages

I could reproduce the problem on:
Linux-2.6.37-hardened-r7
Linux-2.6.39-hardened-r8
Linux-3.0.4-hardened-r5
Linux-3.0.6-gentoo
Linux-3.1.4-gentoo

However I failed to reproduce the problem on:
Linux-2.6.36-hardened-r9

So it seems it was introduced in 2.6.37. I'm going to inspect it later on.
By wishing you a God blessed merry Christmas I thank you for the support ;)

On Fri, Dec 23, 2011 at 11:36 AM, Stan Hoeppner s...@hardwarefreak.comwrote:

 On 12/23/2011 4:12 AM, hydra wrote:
  Yes, it is reproducible.
 
  About the file, I have tarred up the original maildir before I deleted
 the
  dovecot.index.cache, so when I try to bring in the original file, a 'cat'
  does no harm. Moreover, I've tried to do this will all files on the
 virtual
  machine, same results - no harm.

 Is selinux running?  If so, turn it off and see if that helps.  And turn
 off anything/everything else on this Linux VM that's not running on the
 other Dovecot VMs you mentioned are working properly.  Let's do some
 process of elimination here.

 --
 Stan




Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-22 Thread hydra
Hello Timo, thank you for the reply. I was suspecting the same. However:
- the machine runs under Vmware,
- I've tried 3 different kernel versions,
- I've tried 3 different SCSI controllers.

All same results.


On Wed, Dec 21, 2011 at 6:16 PM, Timo Sirainen t...@iki.fi wrote:

 On 21.12.2011, at 18.38, hydra wrote:

  I suspect, that dovecot v2.0.16 under Linux (tested on Gentoo 3.0.4
  Hardened, Gentoo 3.0.6, Gentoo 3.1.4) causes 100% cpu utilization when
  index.cache is broken and doveadm is lanched. Dovecot uses LDAP as the
  userdb and provides IMAP and LDA. Because each user has a quota, the
  doveadm's expunge functionality is used to get rid of old trashed mail.
 
  The mail log:
  Dec 12 07:38:59 www dovecot: imap(user): Error: Corrupted index cache
 file
  /var/data/mail/domain.tld/user/mail/dovecot.index.cache: invalid record
 size

 OK..

  Mail is stored on ext4 and flush will use all of the CPU capacity like
 this:
  load average: 4.00, 4.01, 4.05
 
  ps:
  root 26401 98.2  0.0  0 0 ?RDec20 1369:49
  [flush-253:1]

 That's a kernel process..

  I suspect, that this is something to do with Dovecot, because after
  deleting the dovecot.index.cache file, everything went back to normal.
 When
  this happens, I cannot unmount the drive nor a system reboot works.

 That's a kernel bug..

 I think you're thinking it the wrong way: Dovecot isn't causing your
 system to break. Your system is causing Dovecot to break. Faulty hardware
 or faulty kernel.


Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-22 Thread Stan Hoeppner
On 12/22/2011 8:08 AM, hydra wrote:
 Hello Timo, thank you for the reply. I was suspecting the same. However:
 - the machine runs under Vmware,
 - I've tried 3 different kernel versions,
 - I've tried 3 different SCSI controllers.
 
 All same results.

dmesg output?  Log errors?

Is your EXT4 filesystem on a VMFS volume or an RDM (SAN LUN)?


 On Wed, Dec 21, 2011 at 6:16 PM, Timo Sirainen t...@iki.fi wrote:
 
 On 21.12.2011, at 18.38, hydra wrote:

 That's a kernel process..

 I suspect, that this is something to do with Dovecot, because after
 deleting the dovecot.index.cache file, everything went back to normal.
 When
 this happens, I cannot unmount the drive nor a system reboot works.

System (host machine) reboot, or virtual machine reboot doesn't fix the
problem?  FYI, Linux doesn't unmount drives, it unmounts filesystems.

I'd say you may have a problem with your VMFS volume or RDM, or maybe
just your EXT4 filesystem.  Have you run an fsck on it?  What result?

Or, as Timo suggests, could be a kernel bug.  Or an interaction of these
low level layers causing a problem.  If you can't unmount a filesystem,
that has nothing to do with Dovecot, and points to a much larger, more
critical, problem.

Do you have this problem when booting an older kernel?  Say 2.6.32?  2.6.37?

 That's a kernel bug..

 I think you're thinking it the wrong way: Dovecot isn't causing your
 system to break. Your system is causing Dovecot to break. Faulty hardware
 or faulty kernel.

-- 
Stan



Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-22 Thread hydra
On Thu, Dec 22, 2011 at 3:49 PM, Stan Hoeppner s...@hardwarefreak.comwrote:

 On 12/22/2011 8:08 AM, hydra wrote:
  Hello Timo, thank you for the reply. I was suspecting the same. However:
  - the machine runs under Vmware,
  - I've tried 3 different kernel versions,
  - I've tried 3 different SCSI controllers.
 
  All same results.

 dmesg output?  Log errors?


Nothing there


 Is your EXT4 filesystem on a VMFS volume or an RDM (SAN LUN)?


VMFS




  On Wed, Dec 21, 2011 at 6:16 PM, Timo Sirainen t...@iki.fi wrote:
 
  On 21.12.2011, at 18.38, hydra wrote:

  That's a kernel process..
 
  I suspect, that this is something to do with Dovecot, because after
  deleting the dovecot.index.cache file, everything went back to normal.
  When
  this happens, I cannot unmount the drive nor a system reboot works.

 System (host machine) reboot, or virtual machine reboot doesn't fix the
 problem?  FYI, Linux doesn't unmount drives, it unmounts filesystems.


After the virtual machine reboot, the CPU usage is normal again, but just
until doveadm is launched again (it was run from cron). Sorry for the
partition/drive terminology mess up.



 I'd say you may have a problem with your VMFS volume or RDM, or maybe
 just your EXT4 filesystem.  Have you run an fsck on it?  What result?


A normal system reboot wasn't possible, because the ext4 fs wasn't
unmounted (and it wasn't possible to unmount the fs nor run sync - both
locked up) and thus I had to reboot from the vSphere Client. After the
reboot, fsck placed the fs to a consistent state, however the problem
occurred the next morning, when doveadm from the cron was run again. So a
fsck didn't help.


 Or, as Timo suggests, could be a kernel bug.  Or an interaction of these
 low level layers causing a problem.  If you can't unmount a filesystem,
 that has nothing to do with Dovecot, and points to a much larger, more
 critical, problem.

 Do you have this problem when booting an older kernel?  Say 2.6.32?
  2.6.37?


The oldest available kernel is 2.6.32 so I'm going to test it.

Thank you :)


  That's a kernel bug..
 
  I think you're thinking it the wrong way: Dovecot isn't causing your
  system to break. Your system is causing Dovecot to break. Faulty
 hardware
  or faulty kernel.

 --
 Stan




Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-22 Thread Timo Sirainen
On 22.12.2011, at 18.06, hydra wrote:

 After the virtual machine reboot, the CPU usage is normal again, but just
 until doveadm is launched again (it was run from cron). Sorry for the
 partition/drive terminology mess up.

So it's reproducible? You could see if you can reproduce it by simply reading 
the file(s) that are causing problems. Like with zsh: cat **/*  /dev/null



[Dovecot] Corrupted index cache file and high CPU usage

2011-12-21 Thread hydra
I suspect, that dovecot v2.0.16 under Linux (tested on Gentoo 3.0.4
Hardened, Gentoo 3.0.6, Gentoo 3.1.4) causes 100% cpu utilization when
index.cache is broken and doveadm is lanched. Dovecot uses LDAP as the
userdb and provides IMAP and LDA. Because each user has a quota, the
doveadm's expunge functionality is used to get rid of old trashed mail.

The mail log:
Dec 12 07:38:59 www dovecot: imap(user): Error: Corrupted index cache file
/var/data/mail/domain.tld/user/mail/dovecot.index.cache: invalid record size

Doveadm is launched as:
doveadm -D expunge -A mailbox Trash savedbefore 1w

Mail is stored on ext4 and flush will use all of the CPU capacity like this:
load average: 4.00, 4.01, 4.05

ps:
root 26401 98.2  0.0  0 0 ?RDec20 1369:49
[flush-253:1]

lsblk:
vg_data-mail (dm-1)   253:10   500M  0 /var/data/mail

dovecot -n:
# 2.0.16: /etc/dovecot/dovecot.conf
# OS: Linux 3.0.4-hardened-r5 x86_64 Gentoo Base System release 2.0.3
auth_debug = yes
auth_debug_passwords = yes
auth_verbose = yes
hostname = mail.domain.tld
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
listen = *
mail_debug = yes
mail_gid = vmail
mail_home = /var/data/mail/domain.tld/%n
mail_location = maildir:~/mail
mail_plugins =  mail_log notify zlib quota
mail_uid = vmail
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character
vacation subaddress comparator-i;ascii-numeric relational regex imap4flags
copy include variables body enotify environment mailbox date ihave
passdb {
  args = /etc/dovecot/dovecot-ldap.conf.ext
  driver = ldap
}
plugin {
  autocreate = Sent
  autocreate2 = Trash
  autocreate3 = Spam
  autosubscribe = Sent
  autosubscribe2 = Trash
  autosubscribe3 = Spam
  fts = squat
  fts_squat = partial=4 full=10
  quota = maildir:User quota
  quota_rule = *:storage=100M
  quota_rule2 = Trash:ignore
  quota_warning = storage=95%% quota-warning 95 %u
  quota_warning2 = storage=80%% quota-warning 80 %u
  sieve = ~/.dovecot.sieve
  sieve_dir = ~/sieve
  sieve_quota_max_scripts = 5
  sieve_quota_max_storage = 5M
}
postmaster_address = postmas...@domain.tld
protocols = imap sieve
service auth {
  unix_listener /var/spool/postfix/private/auth {
group = postfix
mode = 0660
user = postfix
  }
  unix_listener auth-userdb {
group = vmail
mode = 0600
user = vmail
  }
}
service quota-warning {
  executable = script /etc/dovecot/quota-warning.sh
  unix_listener quota-warning {
user = vmail
  }
  user = vmail
}
ssl_cert = /etc/postfix/mail.domain.tld-cert.pem
ssl_key = /etc/postfix/mail.domain.tld-key.pem

userdb {
  args = /etc/dovecot/dovecot-ldap.conf.ext
  driver = ldap
}
verbose_proctitle = yes
verbose_ssl = yes
protocol lda {
  info_log_path =
  log_path =
  mail_plugins =  mail_log notify zlib quota sieve
}
protocol imap {
  mail_plugins =  mail_log notify zlib quota autocreate fts fts_squat
imap_zlib imap_quota
}

dovecot-ldap.conf.ext:
hosts = localhost
dn = user
dnpass = pass
base = base
auth_bind = yes
user_attrs =
homeDirectory=home,uidNumber=uid,gidNumber=gid,mailboxQuota=quota_rule=*:bytes=%$
user_filter = ((objectClass=inetOrgPerson)(uid=%u))
pass_attrs = uid=user
pass_filter = ((objectClass=inetOrgPerson)(uid=%u))
iterate_attrs = uid=user
iterate_filter = (objectClass=inetOrgPerson)

I suspect, that this is something to do with Dovecot, because after
deleting the dovecot.index.cache file, everything went back to normal. When
this happens, I cannot unmount the drive nor a system reboot works.

Thank you !


Re: [Dovecot] Corrupted index cache file and high CPU usage

2011-12-21 Thread Timo Sirainen
On 21.12.2011, at 18.38, hydra wrote:

 I suspect, that dovecot v2.0.16 under Linux (tested on Gentoo 3.0.4
 Hardened, Gentoo 3.0.6, Gentoo 3.1.4) causes 100% cpu utilization when
 index.cache is broken and doveadm is lanched. Dovecot uses LDAP as the
 userdb and provides IMAP and LDA. Because each user has a quota, the
 doveadm's expunge functionality is used to get rid of old trashed mail.
 
 The mail log:
 Dec 12 07:38:59 www dovecot: imap(user): Error: Corrupted index cache file
 /var/data/mail/domain.tld/user/mail/dovecot.index.cache: invalid record size

OK..

 Mail is stored on ext4 and flush will use all of the CPU capacity like this:
 load average: 4.00, 4.01, 4.05
 
 ps:
 root 26401 98.2  0.0  0 0 ?RDec20 1369:49
 [flush-253:1]

That's a kernel process..

 I suspect, that this is something to do with Dovecot, because after
 deleting the dovecot.index.cache file, everything went back to normal. When
 this happens, I cannot unmount the drive nor a system reboot works.

That's a kernel bug..

I think you're thinking it the wrong way: Dovecot isn't causing your system to 
break. Your system is causing Dovecot to break. Faulty hardware or faulty 
kernel.