Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-17 Thread Timo Sirainen
On Sat, 2010-11-13 at 13:17 +0100, Ralf Hildebrandt wrote:
 * Timo Sirainen t...@iki.fi:
 
  There's a lot more of IPC going on now. Each process at startup connects
  to config process to read configuration (vs. reading it from environment
  variables). State tracking is done in anvil process (vs. master process
  internally). Logging is via pipes to log process instead of sockets to
  master process (this should improve performance). Maybe other things I
  can't think of now.
 
 Is dstat --ipc a suitable to measure/see what's going on?

That looks like it's about sysv IPC, which Dovecot doesn't use. Maybe
some other options would show something useful, I don't know.

Anyway, getting the rusage stats for v1.2 and comparing them to v2.0
might show something useful. Could you patch your v1.2 with the attached
patch and again get one day's stats through logparse.pl? (Need to change
Debug - Info in its regexp)
diff -r cda53154e222 src/imap/main.c
--- a/src/imap/main.c	Mon Nov 08 19:43:41 2010 +
+++ b/src/imap/main.c	Wed Nov 17 17:16:49 2010 +
@@ -22,6 +22,9 @@
 #include stdlib.h
 #include unistd.h
 #include syslog.h
+#include sys/time.h
+#include sys/resource.h
+#include time-util.h
 
 #define IS_STANDALONE() \
 (getenv(IMAPLOGINTAG) == NULL)
@@ -44,6 +47,7 @@
 enum client_workarounds client_workarounds = 0;
 const char *logout_format;
 const char *imap_id_send, *imap_id_log;
+static struct timeval startup_timeval;
 
 static struct io *log_io = NULL;
 static struct module *modules = NULL;
@@ -311,6 +315,7 @@
 		return 1;
 	}
 
+	gettimeofday(startup_timeval, NULL);
 	/* NOTE: we start rooted, so keep the code minimal until
 	   restrict_access_by_env() is called */
 	lib_init();
@@ -327,6 +332,23 @@
 		io_loop_run(ioloop);
 	main_deinit();
 
+	struct rusage ru;
+	if (getrusage(RUSAGE_SELF, ru)  0)
+		i_error(getrusage() failed: %m);
+	else {
+		int diff = timeval_diff_msecs(ioloop_timeval, startup_timeval);
+
+		i_info(rusage: real=%d.%d user=%lu.%lu sys=%lu.%lu reclaims=%lu 
+			faults=%lu swaps=%lu bin=%lu bout=%lu signals=%lu 
+			volcs=%lu involcs=%lu,
+			diff/1000, diff%1000,
+			(long)ru.ru_utime.tv_sec, (long)ru.ru_utime.tv_usec,
+			(long)ru.ru_stime.tv_sec, (long)ru.ru_stime.tv_usec,
+			ru.ru_minflt, ru.ru_majflt, ru.ru_nswap,
+			ru.ru_inblock, ru.ru_oublock, ru.ru_nsignals,
+			ru.ru_nvcsw, ru.ru_nivcsw);
+	}
+
 	io_loop_destroy(ioloop);
 	lib_deinit();
 


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-17 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

  Is dstat --ipc a suitable to measure/see what's going on?
 
 That looks like it's about sysv IPC, which Dovecot doesn't use. Maybe
 some other options would show something useful, I don't know.

Well...
 
 Anyway, getting the rusage stats for v1.2 and comparing them to v2.0
 might show something useful. Could you patch your v1.2 with the attached
 patch and again get one day's stats through logparse.pl? (Need to change
 Debug - Info in its regexp)

Of course. I just recompiled the new 1.2.x version today :|


-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-17 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

 might show something useful. Could you patch your v1.2 with the attached
 patch 

Done. It seems to work:
Nov 17 20:50:08 postamt dovecot: IMAP(stxxxke): rusage: real=38.583 user=0.4000 
sys=0.80005 reclaims=485 faults=0 swaps=0 bin=0 bout=0 signals=0 volcs=23 
involcs=10
Nov 17 20:50:08 postamt dovecot: IMAP(stxxxke): rusage: real=38.507 user=0.4000 
sys=0.72004 reclaims=483 faults=0 swaps=0 bin=0 bout=0 signals=0 volcs=18 
involcs=4

 and again get one day's stats through logparse.pl? (Need to change
 Debug - Info in its regexp)


  next if (!/^.* ([\w-]+)(\([^\)]*\))?: (Debug: )?rusage: (.*)$/);

I see no Info in my log output, thus I changed 
Debug:  to
(Debug: )?

and 
  my ($type, $data) = ($1, $3);
to
  my ($type, $data) = ($1, $4);
since I added another pair of ()

The output looks plausible!

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-17 Thread Karsten Bräckelmann
On Wed, 2010-11-17 at 20:55 +0100, Ralf Hildebrandt wrote:
   my ($type, $data) = ($1, $3);
 to
   my ($type, $data) = ($1, $4);
 since I added another pair of ()

Just use non-capturing grouping instead. (?:foo)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-13 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

 There's a lot more of IPC going on now. Each process at startup connects
 to config process to read configuration (vs. reading it from environment
 variables). State tracking is done in anvil process (vs. master process
 internally). Logging is via pipes to log process instead of sockets to
 master process (this should improve performance). Maybe other things I
 can't think of now.

Is dstat --ipc a suitable to measure/see what's going on?

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-09 Thread Stan Hoeppner
Ralf Hildebrandt put forth on 11/8/2010 12:44 PM:
 * Stan Hoeppner s...@hardwarefreak.com:
 
 Does this machine have more than 4GB of RAM?  You do realize that merely
 utilizing PAE will cause an increase in context switching, whether on
 bare medal or in a VM guest.  It will probably actually be much higher
 with a VM guest running a PAE kernel.  Also, please tell me the ESX
 kernel you're running is native 64 bit, not 32 bit.  If the VMWare
 kernel itself is doing PAE, as well as the guest Linux kernel, this may
 fully explain the performance disaster you have on your hands, if it is
 indeed due to context switching.
 
 It sure work with 1.2.x now, so that's not really the problem

I'm not so sure we can make that assumption.  I'm leaning toward
something other than context switches, as they are obviously very high
with VMWare, always.

 The bigger question is, why does this problem surface so readily while
 running Dovecot 2.0.x and not while running Dovecot 1.2.x?
 
 EXACTLY
 
 Is 1.2.x merely tickling the dragon's chin, whereas 2.0.x is sticking
 it's head into the dragon's mouth?
 
 I'd say the difference between 1.2 and 2.0 is so dramatic that it's
 probably something else.

Given what we know, that the increase in CPU time is in guest kernel
space, or at least appears so, I'm guessing that Dovecot 2.x is making a
system or library call(s) which your kernel is racing with for extended
time yet still releasing.  Your best bet I'm thinking is to put a trace
on each Dovecot process and find which one(s) are waiting the longest
for system call returns.  Once you know which process is triggering the
problem you can start to narrow down the code segment, obviously with
Timo's help.  I'm starting to get out of my element at this point.

 This very well may be the case.  You need to also look at the CONFIG_HZ=
 value of the Linux kernel of the guest.  If it's a tickless kernel you
 should be fine.  If tickless, IIRC, you should see CONFIG_NO_HZ=y.
 
 # fgrep HZ config-2.6.32-23-generic-pae
 CONFIG_NO_HZ=y
 # CONFIG_HZ_100 is not set
 CONFIG_HZ_250=y
 # CONFIG_HZ_300 is not set
 # CONFIG_HZ_1000 is not set
 CONFIG_HZ=250
 CONFIG_MACHZ_WDT=m

I can't tell from that which is being used as both tickless and 250 are
configured.  If it's 250 that should still be fine.  That will generate
in the neighborhood of 2000 interrupts/sec with 8 vCPUs, which is the
same as a workstation kernel on two vCPUs, which would be configured
with CONFIG_HZ=1000.

-- 
Stan


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-08 Thread Udo Wolter
* Ralf Hildebrandt ralf.hildebra...@charite.de:
  And I'm guessing you're running a 32bit PAE kernel because VMWare ESX
  still doesn't officially support 64bit guests, correct?
 
 No, it's supported, but I don'T want to change the whole system.

That's right, we cannot switch without having several hours downtime. This is
not acceptable. I'm thinking of a way for switching to 64 bit with exchanging
disks etc. But I don't know if this will work, I have to test it first.
 
  Is this the only guest on this host or do you have others?
 
 only guest

Yes, the VM-system has 8 CPUs and that's all the ESX has. Of course, there are
times, when the ESX doesn't have that much stress so the DRS moves 1 or 2 other
machines onto it. But since we got that high load, the rest of the machines all
had been moved off the ESX.

  If this is the only guest, you have 2 dual core dies in that Xeon CPU,
  4 cores total.  I assume you've assigned 4 virtual CPUs to this Debian
  VM?
 
 Yes, something like that

8.

  You may want to run top in the hypervisor console itself (or an SSH
  session into the hypervisor) and watch the %CPU of the hypervisor's
  kernel threads.  That might tell us something as well.
 
 Udo has to answer that, but from what he told me it was fully using
 all cpus with 2.0, and now it's idling with 1.2
 
 More details to follow (from him)

As I said in the other mail: as long as the load isn't high enough we cannot
see any problems in the ESX. Only, if we step over some kind of specific
barrier. I think, it's when even the ESX runs out of possibilities to handle so
many interrupts.

Bye,

Udo
-- 
Udo Wolter
  Geschäftsbereich IT | Abt. System
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570847 | Fax +49 30 450 7570600
  udo.wol...@charite.de | http://www.charite.de



smime.p7s
Description: S/MIME cryptographic signature


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-08 Thread Stan Hoeppner
Udo Wolter put forth on 11/8/2010 4:45 AM:
 * Ralf Hildebrandt ralf.hildebra...@charite.de:
 And I'm guessing you're running a 32bit PAE kernel because VMWare ESX
 still doesn't officially support 64bit guests, correct?

 No, it's supported, but I don'T want to change the whole system.
 
 That's right, we cannot switch without having several hours downtime. This is
 not acceptable. I'm thinking of a way for switching to 64 bit with exchanging
 disks etc. But I don't know if this will work, I have to test it first.

Does this machine have more than 4GB of RAM?  You do realize that merely
utilizing PAE will cause an increase in context switching, whether on
bare medal or in a VM guest.  It will probably actually be much higher
with a VM guest running a PAE kernel.  Also, please tell me the ESX
kernel you're running is native 64 bit, not 32 bit.  If the VMWare
kernel itself is doing PAE, as well as the guest Linux kernel, this may
fully explain the performance disaster you have on your hands, if it is
indeed due to context switching.

The bigger question is, why does this problem surface so readily while
running Dovecot 2.0.x and not while running Dovecot 1.2.x?  Is 1.2.x
merely tickling the dragon's chin, whereas 2.0.x is sticking it's head
into the dragon's mouth?

 Is this the only guest on this host or do you have others?

 only guest
 
 Yes, the VM-system has 8 CPUs and that's all the ESX has. Of course, there are
 times, when the ESX doesn't have that much stress so the DRS moves 1 or 2 
 other
 machines onto it. But since we got that high load, the rest of the machines 
 all
 had been moved off the ESX.
 
 If this is the only guest, you have 2 dual core dies in that Xeon CPU,
 4 cores total.  I assume you've assigned 4 virtual CPUs to this Debian
 VM?

 Yes, something like that
 
 8.

Ralf gave me the model number of that server and said it was a single
CPU machine.  I looked up the specs, and if that is the case, there are
4 cores total in that Xeon.  And, IIRC, that Xeon does not have the
HyperThreading circuitry.  So, are there two physical CPUs in the
machine with 4 cores each, or 1 CPU with 4 cores and HT, appearing as 8
cores?  If it's one 4 core CPU with HT enabled, reboot the machine and
disable HT in the BIOS.  HT itself also contributes to high context
switching.  HT is more of a hindrance to ESX performance than a benefit.

www.vmware.com/pdf/vi_performance_tuning.pdf

 You may want to run top in the hypervisor console itself (or an SSH
 session into the hypervisor) and watch the %CPU of the hypervisor's
 kernel threads.  That might tell us something as well.

 Udo has to answer that, but from what he told me it was fully using
 all cpus with 2.0, and now it's idling with 1.2

 More details to follow (from him)
 
 As I said in the other mail: as long as the load isn't high enough we cannot
 see any problems in the ESX. Only, if we step over some kind of specific
 barrier. I think, it's when even the ESX runs out of possibilities to handle 
 so
 many interrupts.

This very well may be the case.  You need to also look at the CONFIG_HZ=
value of the Linux kernel of the guest.  If it's a tickless kernel you
should be fine.  If tickless, IIRC, you should see CONFIG_NO_HZ=y.

However, if CONFIG_HZ=1000 you're generating WAY too many interrupts/sec
to the timer, ESPECIALLY on an 8 core machine.  This will exacerbate the
high context switching problem.  On an 8 vCPU (and physical CPU) machine
you should have CONFIG_HZ=100 or a tickless kernel.  You may get by
using 250, but anything higher than that is trouble.

-- 
Stan


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-08 Thread Ralf Hildebrandt
* Stan Hoeppner s...@hardwarefreak.com:

 Does this machine have more than 4GB of RAM?  You do realize that merely
 utilizing PAE will cause an increase in context switching, whether on
 bare medal or in a VM guest.  It will probably actually be much higher
 with a VM guest running a PAE kernel.  Also, please tell me the ESX
 kernel you're running is native 64 bit, not 32 bit.  If the VMWare
 kernel itself is doing PAE, as well as the guest Linux kernel, this may
 fully explain the performance disaster you have on your hands, if it is
 indeed due to context switching.

It sure work with 1.2.x now, so that's not really the problem

 The bigger question is, why does this problem surface so readily while
 running Dovecot 2.0.x and not while running Dovecot 1.2.x?

EXACTLY

 Is 1.2.x merely tickling the dragon's chin, whereas 2.0.x is sticking
 it's head into the dragon's mouth?

I'd say the difference between 1.2 and 2.0 is so dramatic that it's
probably something else.

 This very well may be the case.  You need to also look at the CONFIG_HZ=
 value of the Linux kernel of the guest.  If it's a tickless kernel you
 should be fine.  If tickless, IIRC, you should see CONFIG_NO_HZ=y.

# fgrep HZ config-2.6.32-23-generic-pae
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ_WDT=m

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-08 Thread Brandon Davidson
Stan,

On 11/8/10 10:39 AM, Stan Hoeppner s...@hardwarefreak.com wrote:
 
 However, if CONFIG_HZ=1000 you're generating WAY too many interrupts/sec
 to the timer, ESPECIALLY on an 8 core machine.  This will exacerbate the
 high context switching problem.  On an 8 vCPU (and physical CPU) machine
 you should have CONFIG_HZ=100 or a tickless kernel.  You may get by
 using 250, but anything higher than that is trouble.

On modern kernels you can boot with divider=10 to take the HZ from 1000
down to 100 at boot time - no rebuilding necessary.

-Brad



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

 Attached a script to parse and summarize the logs. In a small imaptest
 run I didn't notice high system usage.

I'm trying to run the logparser, but it only emits:

postamt:~#  /var/admhome/hildeb/logparse.pl /var/log/pop3d-imapd.log
type
postamt:~#  /var/admhome/hildeb/logparse.pl  /var/log/pop3d-imapd.log
type


-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Timo Sirainen
On 7.11.2010, at 18.31, Ralf Hildebrandt wrote:

 I'm trying to run the logparser, but it only emits:
 
 postamt:~#  /var/admhome/hildeb/logparse.pl /var/log/pop3d-imapd.log
 type

Probably your timestamps are different. Show one log line?



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

  postamt:~#  /var/admhome/hildeb/logparse.pl /var/log/pop3d-imapd.log
  type
 
 Probably your timestamps are different. Show one log line?

Nov  7 19:37:17 postamt dovecot: imap(ptm-aus): Debug: rusage: real=0.51 
user=0.16001 sys=0.52003 reclaims=665 faults=0 swaps=0 bin=0 bout=0 signals=0 
volcs=10 involcs=8
Nov  7 19:37:18 postamt dovecot: imap(fblanken): Debug: rusage: real=358.878 
user=0.28001 sys=0.324020 reclaims=809 faults=1 swaps=0 bin=384 bout=56 
signals=0 volcs=1814 involcs=29
Nov  7 19:37:19 postamt dovecot: pop3(haarbeck): Debug: rusage: real=0.27 
user=0.4000 sys=0.28001 reclaims=625 faults=0 swaps=0 bin=0 bout=8 signals=0 
volcs=6 involcs=11
Nov  7 19:37:19 postamt dovecot: imap(bstoelck): Debug: rusage: real=0.586 
user=0.0 sys=0.44002 reclaims=651 faults=0 swaps=0 bin=0 bout=0 signals=0 
volcs=10 involcs=25

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Timo Sirainen
On 7.11.2010, at 18.37, Ralf Hildebrandt wrote:

 * Timo Sirainen t...@iki.fi:
 
 postamt:~#  /var/admhome/hildeb/logparse.pl /var/log/pop3d-imapd.log
 type
 
 Probably your timestamps are different. Show one log line?
 
 Nov  7 19:37:17 postamt dovecot: imap(ptm-aus): Debug: rusage: real=0.51 
 user=0.16001 sys=0.52003 reclaims=665 faults=0 swaps=0 bin=0 bout=0 signals=0 
 volcs=10 involcs=8

Attached with a working regexp.
#!/usr/bin/env perl
use strict;

my @keys;
my %types;
my $setkeys = 1;
while () {
  next if (!/^... + \d+ \d+:\d+:\d+ \w+ dovecot: (\w+)(\([^\)]*\))?: Debug: rusage: (.*)$/);
  my ($type, $data) = ($1, $3);
  
  my @new;
  my @list = split( , $data);
  foreach my $arg (@list) {
die broken: $arg if ($arg !~ /^(.*)=([\d+.]+)$/);
if ($setkeys) {
  push @keys, $1;
}
push @new, $2;
  }
  $setkeys = 0;
  
  if (!defined($types{$type})) {
$types{$type} = \...@new;
  } else {
my @old = @{$types{$type}};
for (my $i = 0; $i  scalar @old; $i++) {
  $old[$i] += $new[$i];
}
  }
}

print type;
foreach my $key (@keys) {
  print \t$key;
}
print \n;
foreach my $type (keys %types) {
  print $type;
  my @values = @{$types{$type}};
  for (my $i = 0; $i  scalar @values; $i++) {
print \t;
if ($i  3) {
  printf %.2f, $values[$i];
} else {
  print $values[$i];
}
  }
  print \n;
}



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

 Attached with a working regexp.

I switched a few minutes ago, back to 2.0.6
The load on the server is extremely light (it's sunday):

typerealusersys reclaim faults  swaps   bin bout
signals volcs   involcs
auth38.44   0.861.7822321   0   0   0   0   
0   220 146
pop30.120.400.56645 1   0   656 16  
0   43  3
managesieve 0.750.800.20569 0   0   184 0   
0   11  18
imap0.280.400.40659 0   0   114416  
0   22  6

I adjusted the columns for better readability.
Let's see what tomorrow brings.

What's reclaim? 

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

  Nov  7 19:37:17 postamt dovecot: imap(ptm-aus): Debug: rusage: real=0.51 
  user=0.16001 sys=0.52003 reclaims=665 faults=0 swaps=0 bin=0 bout=0 
  signals=0 volcs=10 involcs=8
 
 Attached with a working regexp.

Hmm, consecutive calls of the program are resulting in identical
output! I'm not sure it's working like intended.

I'm trying to fix that.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-07 Thread Ralf Hildebrandt
* Ralf Hildebrandt ralf.hildebra...@charite.de:
 * Timo Sirainen t...@iki.fi:
 
   Nov  7 19:37:17 postamt dovecot: imap(ptm-aus): Debug: rusage: real=0.51 
   user=0.16001 sys=0.52003 reclaims=665 faults=0 swaps=0 bin=0 bout=0 
   signals=0 volcs=10 involcs=8
  
  Attached with a working regexp.
 
 Hmm, consecutive calls of the program are resulting in identical
 output! I'm not sure it's working like intended.
 
 I'm trying to fix that.
​
Had to change:
\w+

into 
[\w-]+

since some program names contain a -: ssl-params and pop3-login

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Stan Hoeppner
Ralf Hildebrandt put forth on 11/5/2010 4:23 AM:
 Due to the ongoing performance issues with 2.0.x I switched back to
 1.2.15 yesterday evening, with no changes to the machine or my users.
 
 (I migrated from 1.2.15 to 2.0.x by converting the existing config)
 
 Today, we have MUCH LESS load, with the same number of logins/min.
 
 I cannot say what exactly causes this immense increase in load, but one
 observation is that the time spent in system() has now dropped (user and
 iowait have stayed contant) to a third of the values I was seeing with
 2.0.x.
 
 This evening I'll post some graphs showing two comparable 24  hour
 ranges.

Hi Ralf,

What hardware platform? (AMD/Intel/SPARC/PPC, generation/freq)
What OS platform?
What compiler/version?
What threading library?

If IPC is the culprit, it may very well be a platform/compiler/system
library issue and not a dovecot issue, given no one else seems to be
suffering this problem.

-- 
Stan


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Ralf Hildebrandt
* Daniel L. Miller dmil...@amfes.com:

 Dunno if you ever mentioned it - or if it makes any difference - but
 what configure/build options are you using for 1.2 vs 2.0?  Any
 difference in the compiler?  Is your 1.2 a distro pre-packaged binary?

No, both have been compiled from source using these options:

./configure --enable-maintainer-mode 
(dovecot2 uses ./configure --prefix=/usr/dovecot-2 --enable-maintainer-mode 
since I need to install it someplace else)
both using gcc version 4.4.5 (Debian 4.4.5-2) 

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Ralf Hildebrandt
* Stan Hoeppner s...@hardwarefreak.com:

 What hardware platform? (AMD/Intel/SPARC/PPC, generation/freq)
Intel(R) Xeon(R) CPU   L5335  @ 2.00GHz

 What OS platform?
Debian lenny 

 What compiler/version?
gcc version 4.4.5 (Debian 4.4.5-2) 

 What threading library?
? how do I find out?

 If IPC is the culprit, it may very well be a platform/compiler/system
 library issue and not a dovecot issue, given no one else seems to be
 suffering this problem.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Stan Hoeppner
Ralf Hildebrandt put forth on 11/6/2010 9:15 AM:
 * Stan Hoeppner s...@hardwarefreak.com:
 
 What hardware platform? (AMD/Intel/SPARC/PPC, generation/freq)
 Intel(R) Xeon(R) CPU   L5335  @ 2.00GHz
 
 What OS platform?
 Debian lenny 
 
 What compiler/version?
 gcc version 4.4.5 (Debian 4.4.5-2) 

Hmm.  My Lenny systems have 4.3.2-2.  Are you maybe using Squeeze, not
Lenny?  I'm still using i686 systems, but I wouldn't think that would
change the version of GCC that gets installed.  I'm not sure if this may
be playing a role in this problem or not.  What kernel version are you
running, stock Debian or rolled from source?

 What threading library?
 ? how do I find out?

I was mainly asking that in case your platform was something other than
x86.  With Linux you should be using NPTL for threading.  This shouldn't
be a problem.

I'm trying to help you identify what is different on your system from
other OPs that is causing 2.x to perform so badly vs 1.x.  If sys is
high but usr and iowait aren't, then I would think the problem is in a
system library, your kernel, dovecot, or more specifically, the
interaction among all three.

You're using maildir correct?  What filesystem are you using?

Are you doing anything in your Dovecot config, both 1.x and 2.x, that is
unique or non-standards maybe, compared to other OPs?

Is this a virtual machine guest or bare metal host?

What do memory and swap usage look like?

What do you see for %CPU when you watch your kernel threads in top?  Is
one of then eating lots of CPU time?  If so, which one?

-- 
Stan


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Ralf Hildebrandt
* Stan Hoeppner s...@hardwarefreak.com:

 Hmm.  My Lenny systems have 4.3.2-2.  Are you maybe using Squeeze, not
 Lenny?

Yes, squeeze, sorry

 I'm still using i686 systems, but I wouldn't think that would change
 the version of GCC that gets installed.  I'm not sure if this may be
 playing a role in this problem or not.  What kernel version are you
 running, stock Debian or rolled from source?

2.6.32-23-generic-pae, from Ubuntu

 I'm trying to help you identify what is different on your system from
 other OPs that is causing 2.x to perform so badly vs 1.x.  If sys is
 high but usr and iowait aren't, then I would think the problem is in a
 system library, your kernel, dovecot, or more specifically, the
 interaction among all three.
 
 You're using maildir correct?  What filesystem are you using?

Maildir on ext4

 Are you doing anything in your Dovecot config, both 1.x and 2.x, that is
 unique or non-standards maybe, compared to other OPs?

It's all users from /etc/passwd, but nothing special.

 Is this a virtual machine guest or bare metal host?

virtual machine guest
 
 What do memory and swap usage look like?

Memory usage is identical with 2.0 and 1.2:
total: 8GB
free: 5457MB
cached: 1054MB

The machine has no swap.

 What do you see for %CPU when you watch your kernel threads in top?  Is
 one of then eating lots of CPU time?  If so, which one?

Uhm, for that I'd have to switch back and look at kernel threads
explicitly.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Ralf Hildebrandt
* Ralf Hildebrandt ralf.hildebra...@charite.de:

  I'm still using i686 systems, but I wouldn't think that would change
  the version of GCC that gets installed.  I'm not sure if this may be
  playing a role in this problem or not.  What kernel version are you
  running, stock Debian or rolled from source?
 
 2.6.32-23-generic-pae, from Ubuntu

I'm using this one because the bigmem kernels in Debian had some
problems (being: bigmem not working at all, it was not compiled in)

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-06 Thread Stan Hoeppner
Ralf Hildebrandt put forth on 11/6/2010 10:33 AM:
 * Ralf Hildebrandt ralf.hildebra...@charite.de:
 
 I'm still using i686 systems, but I wouldn't think that would change
 the version of GCC that gets installed.  I'm not sure if this may be
 playing a role in this problem or not.  What kernel version are you
 running, stock Debian or rolled from source?

 2.6.32-23-generic-pae, from Ubuntu
 
 I'm using this one because the bigmem kernels in Debian had some
 problems (being: bigmem not working at all, it was not compiled in)

And I'm guessing you're running a 32bit PAE kernel because VMWare ESX
still doesn't officially support 64bit guests, correct?  Or are you
using another hypervisor that also has such a limitation?

Is this the only guest on this host or do you have others?  If this is
the only guest, you have 2 dual core dies in that Xeon CPU, 4 cores
total.  I assume you've assigned 4  virtual CPUs to this Debian VM?

You may want to run top in the hypervisor console itself (or an SSH
session into the hypervisor) and watch the %CPU of the hypervisor's
kernel threads.  That might tell us something as well.

-- 
Stan


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Ralf Hildebrandt ralf.hildebra...@charite.de:
 Due to the ongoing performance issues with 2.0.x I switched back to
 1.2.15 yesterday evening, with no changes to the machine or my users.
 
 (I migrated from 1.2.15 to 2.0.x by converting the existing config)
 
 Today, we have MUCH LESS load, with the same number of logins/min.
 
 I cannot say what exactly causes this immense increase in load, but one
 observation is that the time spent in system() has now dropped (user and
 iowait have stayed contant) to a third of the values I was seeing with
 2.0.x.
 
 This evening I'll post some graphs showing two comparable 24  hour
 ranges.

I uploaded a preliminary screenshot with comments:
http://www.arschkrebs.de/bugs/dovecot.png

During the night we're using clamdscan to scan mailboxes for viruses,
this results in the big block of system  user from 0:00 until about
08:00

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread zhong ming wu
On Fri, Nov 5, 2010 at 5:58 AM, Ralf Hildebrandt
ralf.hildebra...@charite.de wrote:
 I uploaded a preliminary screenshot with comments:
 http://www.arschkrebs.de/bugs/dovecot.png


Unclear from your graphs what is for 2.0 and what is for 1.2

Plotting the same variable for 2.0 and 1.2 data on the same graph will
be more convincing.


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread zhong ming wu
On Fri, Nov 5, 2010 at 6:23 AM, Ralf Hildebrandt
ralf.hildebra...@charite.de wrote:
 * zhong ming wu mr.z.m...@gmail.com:
 On Fri, Nov 5, 2010 at 5:58 AM, Ralf Hildebrandt
 ralf.hildebra...@charite.de wrote:
  I uploaded a preliminary screenshot with comments:
  http://www.arschkrebs.de/bugs/dovecot.png
 

 Unclear from your graphs what is for 2.0 and what is for 1.2

 Left of switching back to 1.2.x is 2.0
 Right of switching back to 1.2.x is 1.2.x

i thought switching back to 1.2.x is title of that graph.
Since you know your server better I assume that you expect data with
2.0 after 18:00 to be high like before.
From someone who does not know your server usage pattern, that graph
isn't useful without much more notes


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread David Ford
why don't you run clamdscan on delivery?  that way you only scan each email 
once, not repeatedly every night until it's deleted.

-david

On 11/05/10 05:58, Ralf Hildebrandt wrote:
 During the night we're using clamdscan to scan mailboxes for viruses,
 this results in the big block of system  user from 0:00 until about
 08:00


-- 
Linux - freedom to build is good Please top-post and trim when replying to my 
messages. I most often read mail on a small device. PGP signature 91ED 44F8 
108B E981 DB67 49AC F450 EFD5 6A99 94A2 VERY NOT-IMPORTANT NOT-LEGAL NOTICES: 
Recalling a message does in no way delete it from my computer. Rather, it 
brings attention to your original email and recalling it causes me to search 
for a reason to find embarrassment. Please don't send message recall messages. 
It's silly and obnoxious and wastes even more bandwidth and patience. 
Regardless of what legal message you append to your email message, I am not 
obligated or constrained in any way shape or form and -every- court backs this 
up. If I feel like printing it out and taping it up at the local gym, or mass 
mailing it to 15,000 people, I will. I feel especially inclined to do so the 
longer your legal advisory is. Such notices are unenforceable and do not 
protect you or your company from things you say, or things others do with
the email. Millions of innocent men, women and children, since the 
introduction of Christianity, have been burnt, tortured, fined, imprisoned; yet 
we have not advanced one inch towards uniformity. What has been the effect of 
coercion? To make half the world fools, and the other half hypocrites. 
--Thomas Jefferson This message is confidential to the Internet at large, 
unless otherwise indicated or apparent from its nature. It may not be 
reproduced on Mars unless it has previously been printed on Uranus. This 
message is directed to the intended recipient only (usually everyone, but 
sometimes nobody and once in a blue moon, just somebody), who may be readily 
determined by the sender of this message and its contents. This email message 
(including any attachments) is not for the sole use of the intended 
recipient(s) and may or may not contain confidential, proprietary and 
privileged information. It may include sarcastic holier than tho content. If 
the reader of this message is
not the intended recipient, or an employee or agent responsible for delivering 
this message to the intended recipient: (a) any dissemination or copying of 
this message is strictly prohibited unless you feel otherwise; and (b) 
immediately notify the sender by return message (but only if the sun has gone 
black) and destroy any copies of this message in any form (electronic, paper or 
carved in stone) that you have. Please destroy by smashing your computer with a 
21lb sledge hammer approximately 17 times to ensure destruction of your system. 
Any unauthorized review, use, disclosure or distribution is most assuredly not 
prohibited and you will not IMMEDIATELY be PROSECUTED to the fullest ... or 
emptiest ... extent of the law. If you are not the intended recipient, please 
immediately notify some random person of how old you are, if you're male, 
female, TV, TG, alien, and if you live on planet earth or the primordial plane 
and your undying desire to fornicate with them by email and
destroy all copies of the original message if you sent it to an underage 
person. Oh, and definitely don't tell me about it. The delivery of this message 
and its information is neither intended to be nor constitutes a disclosure or 
waiver of any trade secrets, intellectual property, attorney work product, or 
attorney-client communications. If you happen to be a corporation that uses 
lawyer-think-speak-asinine-thoughts well then please sit your ass back down and 
we will promptly ignore the hell out of you and your disclaimers. Wait, no we 
won't. We have this urgent primal need to publicly make fun of you, and then 
we'll re-post your message in blazing full frontal nudity across the internet. 
The authority of the individual sending this message to legally bind any entity 
is neither apparent nor implied, and must be independently verified - uh ... 
duh? Isn't that obvious? Of course not. Only people with intelligence recognize 
such simple facts. Thank you for standing in the back
yard and whining your ass off holding up tiny little posters forbidding 
mosquitoes from biting you. Does a whole hell of a lot of good. Right? Yeah, 
you keep up with the delusions. Keeping up with the Jones is good after all. 
Holy hell Batman sleeps with Robin -- This disclaimer is short!



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:
 On 5.11.2010, at 9.58, Ralf Hildebrandt wrote:
 
  I uploaded a preliminary screenshot with comments:
  http://www.arschkrebs.de/bugs/dovecot.png
 
 Were you using v1.2's deliver here in left also? Or how much of a difference 
 did that make alone?

2.0 was indeed using v1.2's deliver, and that made SOME difference
(less load)

Now with 1.2 I'm of course using v1.2's deliver

 How many imap logins per minute (or something) do you get? 
imap: about 70 / minute
imaps: about 50 / minute
pop3: none
pop3s: about 10 / minute 

these are all peak values during noon!

 What about pop3? Are you using a webmail that opens lots of short
 connections (and are people using webmail much)?

webmail (Squirrelmail) is using imapproxy.
 
 I'm wondering if the problem has to do with the way processes now do
 IPC 

That could very well be. Lots of time is spent in the kernel

 or is there just some bug(s) in the mail handling code..


-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* David Ford da...@blue-labs.org:

 why don't you run clamdscan on delivery?

I do.

 that way you only scan each email once, not repeatedly every night
 until it's deleted.

I'm only scanning directories that haven't been scanned for a long time
(I cannot scan all the boxes in one night). Main purpose is to remove
freshly detected viruses/spam that wasn't in the patterns at delivery
time.

The benefit is somewhat limited; one might argue it doesn't make much
sense.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Ralf Hildebrandt ralf.hildebra...@charite.de:
 * zhong ming wu mr.z.m...@gmail.com:
 
   Left of switching back to 1.2.x is 2.0
   Right of switching back to 1.2.x is 1.2.x
  
  i thought switching back to 1.2.x is title of that graph.
  Since you know your server better I assume that you expect data with
  2.0 after 18:00 to be high like before.
 
 No. I expect the load with 2.0 (before the switch back to 1.2) as low
 as the load is with 1.2 now :)

No. I expect the load with 2.0 (before the switch back to 1.2) to be as
low as the load is with 1.2 now :)

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread David Ford


On 11/05/10 08:56, Ralf Hildebrandt wrote:
 I'm only scanning directories that haven't been scanned for a long time
 (I cannot scan all the boxes in one night). Main purpose is to remove
 freshly detected viruses/spam that wasn't in the patterns at delivery
 time.

 The benefit is somewhat limited; one might argue it doesn't make much
 sense.

I'm curious what would show up new in mailboxes other than drafts and sent 
items.  on my networks, AV and anti-spam hooks are via sendmail/milter and get 
called for all smtp regardless of direction which means an infected desktop 
won't be able to transmit spam.

thus, running a nightly scan on mailboxes after delivery means the above - save 
the draft/sent mailboxes, the benefit is zero and it's only going to drive up 
the load.

-d

-- 
Linux - freedom to build is good Please top-post and trim when replying to my 
messages. I most often read mail on a small device. PGP signature 91ED 44F8 
108B E981 DB67 49AC F450 EFD5 6A99 94A2 VERY NOT-IMPORTANT NOT-LEGAL NOTICES: 
Recalling a message does in no way delete it from my computer. Rather, it 
brings attention to your original email and recalling it causes me to search 
for a reason to find embarrassment. Please don't send message recall messages. 
It's silly and obnoxious and wastes even more bandwidth and patience. 
Regardless of what legal message you append to your email message, I am not 
obligated or constrained in any way shape or form and -every- court backs this 
up. If I feel like printing it out and taping it up at the local gym, or mass 
mailing it to 15,000 people, I will. I feel especially inclined to do so the 
longer your legal advisory is. Such notices are unenforceable and do not 
protect you or your company from things you say, or things others do with
the email. Millions of innocent men, women and children, since the 
introduction of Christianity, have been burnt, tortured, fined, imprisoned; yet 
we have not advanced one inch towards uniformity. What has been the effect of 
coercion? To make half the world fools, and the other half hypocrites. 
--Thomas Jefferson This message is confidential to the Internet at large, 
unless otherwise indicated or apparent from its nature. It may not be 
reproduced on Mars unless it has previously been printed on Uranus. This 
message is directed to the intended recipient only (usually everyone, but 
sometimes nobody and once in a blue moon, just somebody), who may be readily 
determined by the sender of this message and its contents. This email message 
(including any attachments) is not for the sole use of the intended 
recipient(s) and may or may not contain confidential, proprietary and 
privileged information. It may include sarcastic holier than tho content. If 
the reader of this message is
not the intended recipient, or an employee or agent responsible for delivering 
this message to the intended recipient: (a) any dissemination or copying of 
this message is strictly prohibited unless you feel otherwise; and (b) 
immediately notify the sender by return message (but only if the sun has gone 
black) and destroy any copies of this message in any form (electronic, paper or 
carved in stone) that you have. Please destroy by smashing your computer with a 
21lb sledge hammer approximately 17 times to ensure destruction of your system. 
Any unauthorized review, use, disclosure or distribution is most assuredly not 
prohibited and you will not IMMEDIATELY be PROSECUTED to the fullest ... or 
emptiest ... extent of the law. If you are not the intended recipient, please 
immediately notify some random person of how old you are, if you're male, 
female, TV, TG, alien, and if you live on planet earth or the primordial plane 
and your undying desire to fornicate with them by email and
destroy all copies of the original message if you sent it to an underage 
person. Oh, and definitely don't tell me about it. The delivery of this message 
and its information is neither intended to be nor constitutes a disclosure or 
waiver of any trade secrets, intellectual property, attorney work product, or 
attorney-client communications. If you happen to be a corporation that uses 
lawyer-think-speak-asinine-thoughts well then please sit your ass back down and 
we will promptly ignore the hell out of you and your disclaimers. Wait, no we 
won't. We have this urgent primal need to publicly make fun of you, and then 
we'll re-post your message in blazing full frontal nudity across the internet. 
The authority of the individual sending this message to legally bind any entity 
is neither apparent nor implied, and must be independently verified - uh ... 
duh? Isn't that obvious? Of course not. Only people with intelligence recognize 
such simple facts. Thank you for standing in the back
yard and whining your ass off holding up tiny little posters forbidding 
mosquitoes from biting you. Does a whole hell of a lot of good. Right? Yeah, 
you keep up with the delusions. Keeping up with 

Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* David Ford da...@blue-labs.org:

 on my networks, AV and anti-spam hooks are via sendmail/milter and get
 called for all smtp regardless of direction which means an infected
 desktop won't be able to transmit spam.

same here.
 
 thus, running a nightly scan on mailboxes after delivery means the
 above - save the draft/sent mailboxes, the benefit is zero and it's
 only going to drive up the load.

My scan usually finds spam that clamd is recognizing due to new
patterns. I just wanted to meantion the reason for that plateau of
load.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Robert Schetterer
Am 05.11.2010 10:58, schrieb Ralf Hildebrandt:
 * Ralf Hildebrandt ralf.hildebra...@charite.de:
 Due to the ongoing performance issues with 2.0.x I switched back to
 1.2.15 yesterday evening, with no changes to the machine or my users.

 (I migrated from 1.2.15 to 2.0.x by converting the existing config)

 Today, we have MUCH LESS load, with the same number of logins/min.

 I cannot say what exactly causes this immense increase in load, but one
 observation is that the time spent in system() has now dropped (user and
 iowait have stayed contant) to a third of the values I was seeing with
 2.0.x.

 This evening I'll post some graphs showing two comparable 24  hour
 ranges.
 
 I uploaded a preliminary screenshot with comments:
 http://www.arschkrebs.de/bugs/dovecot.png
 
 During the night we're using clamdscan to scan mailboxes for viruses,
 this results in the big block of system  user from 0:00 until about
 08:00
 

Hi Ralph, high cpu load is common with clamscan

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Charles Marcus
On 2010-11-05 8:56 AM, Ralf Hildebrandt wrote:
 * David Ford da...@blue-labs.org:
 why don't you run clamdscan on delivery?

 I do.

On 2010-11-05 9:33 AM, Robert Schetterer wrote:
 Hi Ralph, high cpu load is common with clamscan

Hmmm... maybe dovecot 2.0 is doing something different from 1.2 that
causes your *live* clamdscan at delivery time to produce the heavier load...

Have you tried temporarily disabling clamd while running 2.0 and see
what happens?

-- 

Best regards,

Charles


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Charles Marcus
On 2010-11-05 9:18 AM, David Ford wrote:

snip

 -d
 
 -- Linux - freedom to build is good Please top-post and trim when
 replying to my messages.

snip

David, once was funny, and even better when replying to a message from
someone who has a 'real' 'disclaimer' sig - but I sure hope you're not
planning on attaching that annoying as hell sig on every message.

;)

-- 

Best regards,

Charles


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Robert Schetterer rob...@schetterer.org:

 Hi Ralph, high cpu load is common with clamscan

We're not talking about the times where clamdscan is running.
It's ONLY running at night. That's why I labeled the graph accordingly.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Charles Marcus cmar...@media-brokers.com:
 
 Hmmm... maybe dovecot 2.0 is doing something different from 1.2 that
 causes your *live* clamdscan at delivery time to produce the heavier load...

Clamdscan is not running at delivery time on that box, it's running on
another machine. 

On my graph I labeled the NIGHTLY scan (when nobody's doing anything,
they're sleeping).
 
 Have you tried temporarily disabling clamd while running 2.0 and see
 what happens?

See my graph, compare left side with the right side. Do you see the
difference?

Left side: high load (dovecot 2.0, same machine, same users)
Right side: low load (dovecot 1.2, same machine, same users)
middle: completely unrelated clamdscan in the middle of the night (when 
nobody's doing anything, they're sleeping - thus I'm running the scan)

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
  I'm wondering if the problem has to do with the way processes now do
  IPC 
 
 That could very well be. Lots of time is spent in the kernel

What exactly has changed - and what kind of data are the processes
exchanging via IPCs?

And which processes are talking to each other?
-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Charles Marcus
On 2010-11-05 10:05 AM, Ralf Hildebrandt wrote:
 * Charles Marcus cmar...@media-brokers.com:
  
 Hmmm... maybe dovecot 2.0 is doing something different from 1.2 that
 causes your *live* clamdscan at delivery time to produce the heavier load...
 
 Clamdscan is not running at delivery time on that box, it's running on
 another machine. 

??

On 2010-11-05 8:56 AM, Ralf Hildebrandt wrote:
 * David Ford da...@blue-labs.org:
 why don't you run clamdscan on delivery?
 I do.

You plainly state that you *do* run clamdscan on delivery...

-- 

Best regards,

Charles


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Charles Marcus cmar...@media-brokers.com:
 On 2010-11-05 10:05 AM, Ralf Hildebrandt wrote:
  * Charles Marcus cmar...@media-brokers.com:
   
  Hmmm... maybe dovecot 2.0 is doing something different from 1.2 that
  causes your *live* clamdscan at delivery time to produce the heavier 
  load...
  
  Clamdscan is not running at delivery time on that box, it's running on
  another machine. 
 
 ??

Yes, on the gateway which all mails go through
 
 On 2010-11-05 8:56 AM, Ralf Hildebrandt wrote:
  * David Ford da...@blue-labs.org:
  why don't you run clamdscan on delivery?
  I do.
 
 You plainly state that you *do* run clamdscan on delivery...

Not on this machine.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Charles Marcus
On 2010-11-05 10:15 AM, Ralf Hildebrandt wrote:
 * Charles Marcus cmar...@media-brokers.com:
 You plainly state that you *do* run clamdscan on delivery...

 Not on this machine.

Gotcha...

-- 

Best regards,

Charles


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Robert Schetterer
Am 05.11.2010 15:15, schrieb Ralf Hildebrandt:
 * Charles Marcus cmar...@media-brokers.com:
 On 2010-11-05 10:05 AM, Ralf Hildebrandt wrote:
 * Charles Marcus cmar...@media-brokers.com:
  
 Hmmm... maybe dovecot 2.0 is doing something different from 1.2 that
 causes your *live* clamdscan at delivery time to produce the heavier 
 load...

 Clamdscan is not running at delivery time on that box, it's running on
 another machine. 

 ??
 
 Yes, on the gateway which all mails go through
  
 On 2010-11-05 8:56 AM, Ralf Hildebrandt wrote:
 * David Ford da...@blue-labs.org:
 why don't you run clamdscan on delivery?
 I do.

 You plainly state that you *do* run clamdscan on delivery...
 
 Not on this machine.
 

Hi Ralph , ia still not clear about your problem
i understand that you do something with clam and there is difference
between dovocot versions , am i right ?

is this clamscan from amavis or at cron etc?
i use it with milter with latest 2.06
havent seen great performance problems yet, but i only have dove 1.0 to
compare on a total different machine

at last what makes you wonder about your scans?

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Robert Schetterer rob...@schetterer.org:

 Hi Ralph , ia still not clear about your problem
 i understand that you do something with clam and there is difference
 between dovocot versions , am i right ?

No. clamd is not involved.

dovecot-2.0.x : slow
dovecot-1.2.x : pretty fast

same machine, same users, same usage behaviour.

Turn on 2.0: slow
Switch back to 1.20: fast again

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Ralf Hildebrandt ralf.hildebra...@charite.de:

 I uploaded a preliminary screenshot with comments:
 http://www.arschkrebs.de/bugs/dovecot.png
 
 During the night we're using clamdscan to scan mailboxes for viruses,
 this results in the big block of system  user from 0:00 until about
 08:00

Yesterday (08:00-18:00)
http://www.arschkrebs.de/bugs/postamt-last-04.11.2010.png
system peaking at 90%, almost no user, little iowait
load peaking at 40 (15 minute average)
Dovecot 2.0.6

Today (08:00-16:00)
http://www.arschkrebs.de/bugs/postamt-last-05.11.2010.png 
system contantly low, even at noon (30%), almost no user, little iowait
load peaking at 5 (15 minute average)
Dovecot 1.2.15

same users, same machine, same network, same kernel, same ram
Different dovecot version...

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ed W

Hi Ralf

Not sure how your setup is arranged, but do you perhaps have the 
opportunity to do a partial upgrade and switch say only POP or only 
IMAP users to 2.0? (Or only deliver?)  The thought is that you might 
narrow down it down a little?


I'm thinking if you use a virtualisation solution you might be able to 
duplicate your environment, plus perhaps some iptables magic?


Good luck

Ed W


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Ed W li...@wildgooses.com:
 Hi Ralf
 
 Not sure how your setup is arranged, but do you perhaps have the
 opportunity to do a partial upgrade and switch say only POP or only
 IMAP users to 2.0? (Or only deliver?)

Well, why not. It's possible. It's all in place.

What I had was using pop3s  imap  imaps from 2.0, but deliver from 1.2
That was - load-wise - a bit better than pure 2.0

 The thought is that you might narrow down it down a little?

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Timo Sirainen
On Fri, 2010-11-05 at 15:08 +0100, Ralf Hildebrandt wrote:
   I'm wondering if the problem has to do with the way processes now do
   IPC 
  
  That could very well be. Lots of time is spent in the kernel
 
 What exactly has changed - and what kind of data are the processes
 exchanging via IPCs?
 And which processes are talking to each other?

There's a lot more of IPC going on now. Each process at startup connects
to config process to read configuration (vs. reading it from environment
variables). State tracking is done in anvil process (vs. master process
internally). Logging is via pipes to log process instead of sockets to
master process (this should improve performance). Maybe other things I
can't think of now.

Anyway, I'd think the used system time is owned by some process(es).
Would be interesting to know what kind of logs you get with the attached
patch (e.g. run dovecot for an hour..day, stop it, gather all logs,
count the used system times per process type and see which ones used the
most).
diff -r 5a10aaf6f510 src/lib-master/master-service.c
--- a/src/lib-master/master-service.c	Thu Nov 04 18:56:47 2010 +
+++ b/src/lib-master/master-service.c	Fri Nov 05 15:22:51 2010 +
@@ -5,6 +5,7 @@
 #include ioloop.h
 #include array.h
 #include env-util.h
+#include time-util.h
 #include home-expand.h
 #include process-title.h
 #include restrict-access.h
@@ -19,6 +20,8 @@
 #include unistd.h
 #include sys/stat.h
 #include syslog.h
+#include sys/time.h
+#include sys/resource.h
 
 #define DEFAULT_CONFIG_FILE_PATH SYSCONFDIR/dovecot.conf
 
@@ -38,6 +41,7 @@
 #define MASTER_SERVICE_DIE_TIMEOUT_MSECS (30*1000)
 
 struct master_service *master_service;
+static struct timeval startup_timeval;
 
 static void master_service_io_listeners_close(struct master_service *service);
 static void master_service_refresh_login_state(struct master_service *service);
@@ -179,6 +183,7 @@
 		i_set_failure_prefix(t_strdup_printf(%s: , name));
 	}
 
+	gettimeofday(startup_timeval, NULL);
 	master_service_verify_version_string(service);
 	return service;
 }
@@ -637,9 +642,26 @@
 void master_service_deinit(struct master_service **_service)
 {
 	struct master_service *service = *_service;
+	struct rusage ru;
 
 	*_service = NULL;
 
+	if (getrusage(RUSAGE_SELF, ru)  0)
+		i_error(getrusage() failed: %m);
+	else {
+		int diff = timeval_diff_msecs(ioloop_timeval, startup_timeval);
+
+		i_debug(rusage: real=%d.%d user=%lu.%lu sys=%lu.%lu reclaims=%lu 
+			faults=%lu swaps=%lu bin=%lu bout=%lu signals=%lu 
+			volcs=%lu involcs=%lu,
+			diff/1000, diff%1000,
+			(long)ru.ru_utime.tv_sec, (long)ru.ru_utime.tv_usec,
+			(long)ru.ru_stime.tv_sec, (long)ru.ru_stime.tv_usec,
+			ru.ru_minflt, ru.ru_majflt, ru.ru_nswap,
+			ru.ru_inblock, ru.ru_oublock, ru.ru_nsignals,
+			ru.ru_nvcsw, ru.ru_nivcsw);
+	}
+
 	master_service_io_listeners_remove(service);
 
 	master_service_close_config_fd(service);


Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Ralf Hildebrandt
* Timo Sirainen t...@iki.fi:

 There's a lot more of IPC going on now. Each process at startup connects
 to config process to read configuration (vs. reading it from environment
 variables). 

OK

 State tracking is done in anvil process (vs. master process
 internally). 

anvil is completely new, I noticed that one.

 Logging is via pipes to log process instead of sockets to master
 process (this should improve performance). Maybe other things I can't
 think of now.
 
 Anyway, I'd think the used system time is owned by some process(es).
 Would be interesting to know what kind of logs you get with the attached
 patch (e.g. run dovecot for an hour..day, stop it, gather all logs,
 count the used system times per process type and see which ones used the
 most).

I will try that.


-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: [Dovecot] Ongoing performance issues with 2.0.x

2010-11-05 Thread Timo Sirainen
On Fri, 2010-11-05 at 15:25 +, Timo Sirainen wrote:

 Anyway, I'd think the used system time is owned by some process(es).
 Would be interesting to know what kind of logs you get with the attached
 patch (e.g. run dovecot for an hour..day, stop it, gather all logs,
 count the used system times per process type and see which ones used the
 most).

Attached a script to parse and summarize the logs. In a small imaptest
run I didn't notice high system usage.



logparse.pl
Description: Perl program