Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-03-04 Thread Ian G Batten

On 28 Feb 08, at 2256, Kenneth Marshall wrote:

 It may be that the software RAID 5 is your problem. Without the
 use of NVRAM for a cache, all of the writes need all 3 disks.
 That will cause quite a bottle-neck.


In general, RAID5 writes require two reads and two writes,  
independent of the size of the RAID5 assemblage.  To write a given  
block, you read the previous contents of the block you are updating  
and the associated parity block.  You XOR the previous contents with  
the parity, thus stripping it out, and then XOR the new contents in.   
You then write the new contents to the data block and the updated  
parity to the parity block.

New Partity = Old Parity xor Old Contents xor New Contents

In the absence of NVRAM this requires precisely four disk operations,  
two reads followed by two writes.

A naive implementation would, as you imply, use all the spindles.  It  
would read contents of the parity stripe from the spindles not  
directly involved in the update, compute the new parity block, and  
then write the data block and the new parity.  For an N disk RAID5  
assemblage that's N-2 reads followed by 2 writes, N operations.

Now as it happens, for the pathological case of a 3-disk RAID5  
assemblage, the naive implementation is better than the more standard  
implementation.  I don't know if any real-world code is optimised for  
this corner case.  I would doubt it: software RAID5 is a performance  
disaster area at the best of times unless it can take advantage of  
intimate knowledge of the intent log in the filesystem (RAID-Z does  
this), and three-disk RAID5 assemblages are a performance disaster  
area irrespective of hardware in a failure scenario.  The rebuild  
will involve taking 50% of the IO bandwidth of the two remaining  
disks in order to saturate the new target; rebuild performance ---  
contrary to intuition --- improves with larger assemblages as you can  
saturate the replacement disk with less and less of the bandwidth of  
the surviving spindles.

For a terabyte, 3x500GB SATA drives in a RAID5 group will be blown  
out of the water by 4x500GB SATA drives in a RAID 0+1 configuration  
in terms of performance and (especially) latency, especially if it  
can do the Solaris trick of not faulting an entire RAID 0 sub-group  
if one spindle fails.  Rebuild still isn't pretty, mind you.

ian


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Nik Conwell

On Feb 28, 2008, at 4:38 PM, Jeff Fookson wrote:

 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of  
 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the  
 2-15
 range, but can spike to 50-70!

Typically when deadlocks free you get load spikes as work can now  
progress.  It implies one thing was holding the lock for a long time -  
that thing itself probably being impeded by something else.  If there  
was high activity of many things hitting the lock, you wouldn't expect  
to see spikes - the system might even look idle as everything is just  
waiting for the lock.

 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159
[...]
 Can anyone suggest what we might do next to debug the problem further?

Good job with the strace.  Now figure out what fd 10 is, either by  
lsof or earlier in the strace output (look for = 10 and that should  
show what opened it).

Then install lslk and figure out who is holding the lock on that file  
and for how long, etc.  Then look at that process to see what it's  
doing for so long (strace again).

-nik


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Kenneth Marshall
On Fri, Feb 29, 2008 at 07:16:24AM +0100, Pascal Gienger wrote:
 Jeff Fookson [EMAIL PROTECTED] wrote:
 
  Databases are all skiplist.
 
 As a rule of thumb, do not use skiplist for the duplicate delivery 
 suppression database (deliver.db). Even if everybody hates it, use 
 BerkeleyDB, Version 4.4.52 or higher. Give it a quite fair amount of shared 
 memory. And run cyr_expunge often to prune that database so that no entry 
 is older than - say - 3 days.
 
 We have approx 10-15 messages/sec incoming on one node.

I would like to add that we use skiplist for the deliver.db here
with a hardware caching controller for a system with 7500 accounts
and have no performance problems. It is key to run cyr_expunge to
keep it pruned. Also, with your setup (software RAID + DRBD) you
would benefit from the in memory nature of the BerkeleyDB format.
That one change may make a significant improvement for your system.

Cheers,
Ken

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Gabor Gombas
On Thu, Feb 28, 2008 at 04:56:18PM -0600, Kenneth Marshall wrote:

 It may be that the software RAID 5 is your problem. Without the
 use of NVRAM for a cache, all of the writes need all 3 disks.
 That will cause quite a bottle-neck.

It's much worse than that. Since metadata updates are almost certainly
smaller than the stripe size, evety metadata update will look like this:

- read the full stripe (i.e. read from ALL disks)
- calculate the new parity
- write back the modification  the new parity

That sure as hell will kill your performance. Move at least the matadata
partition to a RAID1 or RAID10 array. With Linux, you can do RAID10 even
with just 3 disks, but you will of course loose 1/2 disk capacity
compared to RAID5.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Allen Chen
I just got out of this kind of situation.
If your OS is Linux, can you post /etc/syslog.conf?

Allen

Jeff Fookson wrote:
 Folks-

 I am hoping to get some help and guidance as to why our installation of 
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a 
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2 
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from 
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15 
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day 
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the imapds 
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to 
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the 
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is 
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd scales 
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be 
 going on.

 A non-standard aspect of our installation which may have something to do 
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of 
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in 
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit 
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues 
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to 
 just local writes.

 Can anyone suggest what we might do next to debug the problem further? 
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take 
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

   


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Simon Matter
 Can you put a - just before /var/log/messages and
 /var/log/cyrus/imapd.log in your /etc/syslog.conf? (just like
 -/var/log/maillog)
 and restart syslog: service syslog restart.

Another culprit can be name resolution. At least localhost and the servers
own hostnames should be listed in the hosts file for fast lookups. And how
important it is to look at it shoes for example postfix: if you are using
postfix on the imap server and deliver via LMTP socket, postfix in it's
default configuration will still resolve everything using DNS, even
localhost. It simply doesn't care about your resolver configuration. So,
for final delivery postfix instance, I always use disable_dns_lookups =
yes which means use your OS resolver to lookup hosts and not the builtin
DNS resolver.

Another problem can be handling of groups. If there is any special
configuration I suggest to look at it as well.

Simon



 Allen


 Jeff Fookson wrote:
 Allen Chen wrote:

 I just got out of this kind of situation.
 If your OS is Linux, can you post /etc/syslog.conf?

 Allen


 Allan-

 Yes, the installation is running under CentOS4.4, kernel 2.6.18.8.
 I've attached our /etc/syslog.conf.
 I am really curious what you found and got out of that makes you
 suspect syslog involvement.
 Thanks.

 Jeff


 Jeff Fookson wrote:

 Folks-

 I am hoping to get some help and guidance as to why our installation
 of cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory
 supporting a user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds'
 (from about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the
 2-15 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the
 day when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the
 imapds are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace'
 attached to an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what
 is causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd
 scales well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might
 be going on.

 A non-standard aspect of our installation which may have something
 to do with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at
 gigabit speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any
 issues due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced
 to just local writes.

 Can anyone suggest what we might do next to debug the problem
 further? Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson







 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Kenneth Marshall
Delivery through the lmtpd process should not take long enough
to cause this type of backlog unless there is a performance
bottle-neck, such as the delivery DB format that has been suggested
previously, particularly in such a small system.

Cheers,
Ken
On Thu, Feb 28, 2008 at 04:09:58PM -0600, Paul M Fleming wrote:
 Limit the number of lmtpd daemons to around 10 -- that solved the issue 
 for me.. We let sendmail handle the queuing. It is more than likely a 
 locking issue..
 
 
 Michael Bacon wrote:
  What database format are you using for the mailboxes database?  What kind 
  of storage is the metapartition (usually /var/imap) on?  What kind of 
  storage are your mail partitions on?
  
  
  --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
  [EMAIL PROTECTED] wrote:
  
  Folks-
 
  I am hoping to get some help and guidance as to why our installation of
  cyrus-imapd 2.3.9
  is unusably slow. Here are the specifics:
 
  The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
  user base of about 400
  users. The average rate of arriving mail is on the order of 1-2
  messages/sec. The active mailstore
  is about 200GB.  There are typically about 200  'imapd'
  processes at a given time and a hugely varying number of 'lmtpds' (from
  about 6 to many hundreds during
  times of greatest pathology). System load is correspondingly in the 2-15
  range, but can spike to 50-70!
 
  Our users complain that the system is extremely sluggish during the day
  when the system is most busy.
 
  The most obvious thing we observe is that both the lmtpds and the imapds
  are spending HUGE times waiting
  on locks. Even when the system load is only 1-2, an 'strace' attached to
  an instance of lmtpd or imapd shows
  waits of  upwards of 1-2 minutes to get a write lock as shown by the
  example below (this is from a trace of an 'lmtpd')
 
  [strace -f -p 9817 -T]
  9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
  len=0}) = 0 84.998159
 
  We strongly suspect that these large times waiting on locks is what is
  causing the slowness our users are reporting.
 
  We are under the impression that a single instance of cyrus-imapd scales
  well up to about 1000 users (with about 1MB active
  memory per 'imapd' process),  and so we are baffled as to what might be
  going on.
 
  A non-standard aspect of our installation which may have something to do
  with the problem is that we are
  running cyrus on an lvm2 partition that itself is running on top of
  drbd. Thinking that the remote writes
  to the drbd secondary might be causing delays, we put the primary in
  stand-alone mode so that the drbd layer
  was not doing any network activity (the drbd link is running at gigabit
  speed on its own crossover cable to
  the secondary box) and saw no significant change in behavior. Any issues
  due to locking and the lvm2 layer
  would, of course, still be present even with drbd's activity reduced to
  just local writes.
 
  Can anyone suggest what we might do next to debug the problem further?
  Needless to say, our users get
  extremely unhappy when trivial operations in their mail clients take
  over a minute to complete.
 
  Thank you for any thoughts or advice.
 
  Jeff Fookson
 
  --
  Jeffrey E. Fookson, PhDPhone: (520) 621 3091
  Support Systems Analyst, Principal [EMAIL PROTECTED]
  Steward Observatory
  University of Arizona
 
  
  Cyrus Home Page: http://cyrusimap.web.cmu.edu/
  Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
  List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
  
  
  
  
  
  Cyrus Home Page: http://cyrusimap.web.cmu.edu/
  Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
  List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Henrique de Moraes Holschuh
On Thu, 28 Feb 2008, Michael Bacon wrote:
 I've never seen drbd used for Cyrus, but it looks like other folks have 
 done it.  The combination of drbd+lvm2+ext3 might put you somewhere 
 unpleasant, but I'll have to let the Linux-heads jump in on that one.

Don't try it with 4k stacks, IMO.  It could blow up badly.  Stacked devices
and filesystems have this nasty tendency to eat up way too much stack :(

And whatever you do, don't do mailspool IO patterns over Linux raid5 with
the raid bitmap updates enabled and ext3.  Performance goes to crap.  I
don't exactly know how to enable or disable these bitmaps, though.  Look at
mdadm's manpage.

  a linux software RAID 5 (3 SATA disks). On top of the md layer is the
  drbd device; on top of that is an lvm2 logical volume; on top of that is
  an ext3 filesystem, mounted
  as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
  /var/imap/config (and we also have /var/imap/certs for the ssl stuff, and
  /var/imap/sieve for sieve scripts).

Do look into that md raid bitmap option, remember that using lvm anywhere in
a chain kills any and all write-barrier support which means a full
sync-cache command to the HD even if it is a nice SCSI one, remember that
drbd is not a lightning bolt either (you do have a direct gigabit ethernet
link in use just for the drbd sync, don't you?), and remember to inform lvm
AND ext3 of the raid stripe size when making the filesystems and lvm
volumes.

Also, the usual mount tricks like noatime should apply.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Simon Matter
 Michael Bacon wrote:

 What database format are you using for the mailboxes database?  What
 kind of storage is the metapartition (usually /var/imap) on?  What
 kind of storage are your mail partitions on?

 Databases are all skiplist. Our mail partition and the metapartition are

skiplist is good.

 both on the same filesystem, as we intended that both be part of the
 same drbd mirror. That partition is
 a linux software RAID 5 (3 SATA disks). On top of the md layer is the

software RAID 5 seems fine for data but I stronly suggest separate RAID 1
for config.

 drbd device; on top of that is an lvm2 logical volume; on top of that is

I don't think LVM2 is the problem here, I'm using it almost everywhere.
The same with ext3.

I have never used drbd in production but, could it be that it's causing
you the problems? I've done some intensive benchmarks with different
solutions like AOE and gnbd and found that it performs quite bad for
certain types of usage.
Couldn't you test by simply mounting the LVM device without the drbd layer
(maybe with an offset where the real filesystem begins)?

What I know for sure is that your server should do very fine with that
count of connections.

Simon

 an ext3 filesystem, mounted
 as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
 /var/imap/config (and we also have /var/imap/certs for the ssl stuff,
 and /var/imap/sieve for sieve scripts).

 Thanks.

 Jeff Fookson



 --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
 [EMAIL PROTECTED] wrote:

 Folks-

 I am hoping to get some help and guidance as to why our installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting
 a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the
 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the
 imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached
 to
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd
 scales
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be
 going on.

 A non-standard aspect of our installation which may have something to
 do
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any
 issues
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to
 just local writes.

 Can anyone suggest what we might do next to debug the problem further?
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

 --
 Jeffrey E. Fookson, PhDPhone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html







 --
 Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Jeff Fookson
Folks-

I am hoping to get some help and guidance as to why our installation of 
cyrus-imapd 2.3.9
is unusably slow. Here are the specifics:

The software is running on a 1.6GHz Opteron with 2Gb memory supporting a 
user base of about 400
users. The average rate of arriving mail is on the order of 1-2 
messages/sec. The active mailstore
is about 200GB.  There are typically about 200  'imapd'
processes at a given time and a hugely varying number of 'lmtpds' (from 
about 6 to many hundreds during
times of greatest pathology). System load is correspondingly in the 2-15 
range, but can spike to 50-70!

Our users complain that the system is extremely sluggish during the day 
when the system is most busy.

The most obvious thing we observe is that both the lmtpds and the imapds 
are spending HUGE times waiting
on locks. Even when the system load is only 1-2, an 'strace' attached to 
an instance of lmtpd or imapd shows
waits of  upwards of 1-2 minutes to get a write lock as shown by the 
example below (this is from a trace of an 'lmtpd')

[strace -f -p 9817 -T]
9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
len=0}) = 0 84.998159

We strongly suspect that these large times waiting on locks is what is 
causing the slowness our users are reporting.

We are under the impression that a single instance of cyrus-imapd scales 
well up to about 1000 users (with about 1MB active
memory per 'imapd' process),  and so we are baffled as to what might be 
going on.

A non-standard aspect of our installation which may have something to do 
with the problem is that we are
running cyrus on an lvm2 partition that itself is running on top of 
drbd. Thinking that the remote writes
to the drbd secondary might be causing delays, we put the primary in 
stand-alone mode so that the drbd layer
was not doing any network activity (the drbd link is running at gigabit 
speed on its own crossover cable to
the secondary box) and saw no significant change in behavior. Any issues 
due to locking and the lvm2 layer
would, of course, still be present even with drbd's activity reduced to 
just local writes.

Can anyone suggest what we might do next to debug the problem further? 
Needless to say, our users get
extremely unhappy when trivial operations in their mail clients take 
over a minute to complete.

Thank you for any thoughts or advice.

Jeff Fookson

-- 
Jeffrey E. Fookson, PhD Phone: (520) 621 3091
Support Systems Analyst, Principal  [EMAIL PROTECTED]
Steward Observatory
University of Arizona


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Michael Bacon
What database format are you using for the mailboxes database?  What kind 
of storage is the metapartition (usually /var/imap) on?  What kind of 
storage are your mail partitions on?


--On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
[EMAIL PROTECTED] wrote:

 Folks-

 I am hoping to get some help and guidance as to why our installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd scales
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be
 going on.

 A non-standard aspect of our installation which may have something to do
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to
 just local writes.

 Can anyone suggest what we might do next to debug the problem further?
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

 --
 Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html





Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Vincent Fox
Jeff Fookson wrote:
 is unusably slow. Here are the specifics:
   
You are mighty short on the SPECIFICS of your setup.
Expect a slew of questions to elicit this information.



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Paul M Fleming
Limit the number of lmtpd daemons to around 10 -- that solved the issue 
for me.. We let sendmail handle the queuing. It is more than likely a 
locking issue..


Michael Bacon wrote:
 What database format are you using for the mailboxes database?  What kind 
 of storage is the metapartition (usually /var/imap) on?  What kind of 
 storage are your mail partitions on?
 
 
 --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
 [EMAIL PROTECTED] wrote:
 
 Folks-

 I am hoping to get some help and guidance as to why our installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd scales
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be
 going on.

 A non-standard aspect of our installation which may have something to do
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to
 just local writes.

 Can anyone suggest what we might do next to debug the problem further?
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

 --
 Jeffrey E. Fookson, PhD  Phone: (520) 621 3091
 Support Systems Analyst, Principal   [EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 
 
 
 
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Kenneth Marshall
Jeff,

Delivery database format can cause this type of problem, among
other databases. For any DB that is updated with contention, use
either BerkeleyDB or Skiplist format. We also had a similar issue
when we did not have the expunge process running and pruning the
delivery database and its size kept growing until it slowed down
the entire system.

Cheers,
Ken

On Thu, Feb 28, 2008 at 02:38:37PM -0700, Jeff Fookson wrote:
 Folks-
 
 I am hoping to get some help and guidance as to why our installation of 
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:
 
 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a 
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2 
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from 
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15 
 range, but can spike to 50-70!
 
 Our users complain that the system is extremely sluggish during the day 
 when the system is most busy.
 
 The most obvious thing we observe is that both the lmtpds and the imapds 
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to 
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the 
 example below (this is from a trace of an 'lmtpd')
 
 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
 len=0}) = 0 84.998159
 
 We strongly suspect that these large times waiting on locks is what is 
 causing the slowness our users are reporting.
 
 We are under the impression that a single instance of cyrus-imapd scales 
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be 
 going on.
 
 A non-standard aspect of our installation which may have something to do 
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of 
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in 
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit 
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues 
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to 
 just local writes.
 
 Can anyone suggest what we might do next to debug the problem further? 
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take 
 over a minute to complete.
 
 Thank you for any thoughts or advice.
 
 Jeff Fookson
 
 -- 
 Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona
 
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Jeff Fookson
Michael Bacon wrote:

 What database format are you using for the mailboxes database?  What 
 kind of storage is the metapartition (usually /var/imap) on?  What 
 kind of storage are your mail partitions on?

Databases are all skiplist. Our mail partition and the metapartition are 
both on the same filesystem, as we intended that both be part of the 
same drbd mirror. That partition is
a linux software RAID 5 (3 SATA disks). On top of the md layer is the 
drbd device; on top of that is an lvm2 logical volume; on top of that is 
an ext3 filesystem, mounted
as '/var/imap'. The mail is then in /var/imap/mail and the metadata in 
/var/imap/config (and we also have /var/imap/certs for the ssl stuff, 
and /var/imap/sieve for sieve scripts).

Thanks.

Jeff Fookson



 --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
 [EMAIL PROTECTED] wrote:

 Folks-

 I am hoping to get some help and guidance as to why our installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd scales
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be
 going on.

 A non-standard aspect of our installation which may have something to do
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to
 just local writes.

 Can anyone suggest what we might do next to debug the problem further?
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

 -- 
 Jeffrey E. Fookson, PhDPhone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html







-- 
Jeffrey E. Fookson, PhD Phone: (520) 621 3091
Support Systems Analyst, Principal  [EMAIL PROTECTED]
Steward Observatory
University of Arizona


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Michael Bacon
Jeff,

Just as a rule of thumb, if you've got problems with Cyrus (or any mail 
system), 90% of the time they're related to I/O performance.

I've never seen drbd used for Cyrus, but it looks like other folks have 
done it.  The combination of drbd+lvm2+ext3 might put you somewhere 
unpleasant, but I'll have to let the Linux-heads jump in on that one.

Beyond that, I don't see anything obviously wrong, but maybe someone who's 
run it more on Linux can chime in.

-Michael

--On Thursday, February 28, 2008 3:36 PM -0700 Jeff Fookson 
[EMAIL PROTECTED] wrote:

 Michael Bacon wrote:

 What database format are you using for the mailboxes database?  What
 kind of storage is the metapartition (usually /var/imap) on?  What
 kind of storage are your mail partitions on?

 Databases are all skiplist. Our mail partition and the metapartition are
 both on the same filesystem, as we intended that both be part of the same
 drbd mirror. That partition is
 a linux software RAID 5 (3 SATA disks). On top of the md layer is the
 drbd device; on top of that is an lvm2 logical volume; on top of that is
 an ext3 filesystem, mounted
 as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
 /var/imap/config (and we also have /var/imap/certs for the ssl stuff, and
 /var/imap/sieve for sieve scripts).

 Thanks.

 Jeff Fookson



 --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
 [EMAIL PROTECTED] wrote:

 Folks-

 I am hoping to get some help and guidance as to why our installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace' attached to
 an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 84.998159

 We strongly suspect that these large times waiting on locks is what is
 causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd scales
 well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might be
 going on.

 A non-standard aspect of our installation which may have something to do
 with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at gigabit
 speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any issues
 due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced to
 just local writes.

 Can anyone suggest what we might do next to debug the problem further?
 Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson

 --
 Jeffrey E. Fookson, PhDPhone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona

 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html







 --
 Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona






Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Kenneth Marshall
It may be that the software RAID 5 is your problem. Without the
use of NVRAM for a cache, all of the writes need all 3 disks.
That will cause quite a bottle-neck.

Ken

On Thu, Feb 28, 2008 at 03:36:43PM -0700, Jeff Fookson wrote:
 Michael Bacon wrote:
 
  What database format are you using for the mailboxes database?  What 
  kind of storage is the metapartition (usually /var/imap) on?  What 
  kind of storage are your mail partitions on?
 
 Databases are all skiplist. Our mail partition and the metapartition are 
 both on the same filesystem, as we intended that both be part of the 
 same drbd mirror. That partition is
 a linux software RAID 5 (3 SATA disks). On top of the md layer is the 
 drbd device; on top of that is an lvm2 logical volume; on top of that is 
 an ext3 filesystem, mounted
 as '/var/imap'. The mail is then in /var/imap/mail and the metadata in 
 /var/imap/config (and we also have /var/imap/certs for the ssl stuff, 
 and /var/imap/sieve for sieve scripts).
 
 Thanks.
 
 Jeff Fookson
 
 
 
  --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
  [EMAIL PROTECTED] wrote:
 
  Folks-
 
  I am hoping to get some help and guidance as to why our installation of
  cyrus-imapd 2.3.9
  is unusably slow. Here are the specifics:
 
  The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
  user base of about 400
  users. The average rate of arriving mail is on the order of 1-2
  messages/sec. The active mailstore
  is about 200GB.  There are typically about 200  'imapd'
  processes at a given time and a hugely varying number of 'lmtpds' (from
  about 6 to many hundreds during
  times of greatest pathology). System load is correspondingly in the 2-15
  range, but can spike to 50-70!
 
  Our users complain that the system is extremely sluggish during the day
  when the system is most busy.
 
  The most obvious thing we observe is that both the lmtpds and the imapds
  are spending HUGE times waiting
  on locks. Even when the system load is only 1-2, an 'strace' attached to
  an instance of lmtpd or imapd shows
  waits of  upwards of 1-2 minutes to get a write lock as shown by the
  example below (this is from a trace of an 'lmtpd')
 
  [strace -f -p 9817 -T]
  9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
  len=0}) = 0 84.998159
 
  We strongly suspect that these large times waiting on locks is what is
  causing the slowness our users are reporting.
 
  We are under the impression that a single instance of cyrus-imapd scales
  well up to about 1000 users (with about 1MB active
  memory per 'imapd' process),  and so we are baffled as to what might be
  going on.
 
  A non-standard aspect of our installation which may have something to do
  with the problem is that we are
  running cyrus on an lvm2 partition that itself is running on top of
  drbd. Thinking that the remote writes
  to the drbd secondary might be causing delays, we put the primary in
  stand-alone mode so that the drbd layer
  was not doing any network activity (the drbd link is running at gigabit
  speed on its own crossover cable to
  the secondary box) and saw no significant change in behavior. Any issues
  due to locking and the lvm2 layer
  would, of course, still be present even with drbd's activity reduced to
  just local writes.
 
  Can anyone suggest what we might do next to debug the problem further?
  Needless to say, our users get
  extremely unhappy when trivial operations in their mail clients take
  over a minute to complete.
 
  Thank you for any thoughts or advice.
 
  Jeff Fookson
 
  -- 
  Jeffrey E. Fookson, PhDPhone: (520) 621 3091
  Support Systems Analyst, Principal[EMAIL PROTECTED]
  Steward Observatory
  University of Arizona
 
  
  Cyrus Home Page: http://cyrusimap.web.cmu.edu/
  Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
  List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 
 
 
 
 
 
 
 -- 
 Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
 Support Systems Analyst, Principal[EMAIL PROTECTED]
 Steward Observatory
 University of Arizona
 
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Scott Likens
Okay, I read over this and I felt worth commenting...

There's mention of using MD, DRBD, LVM2, etc... it sounds extremely  
conviluted and way to complex for what you are needing.

When you are doing a read or a write, each thing takes it's time  
before it gets commited to disk.

If you are doing DRBD, you may want to change a few settings
you're doing raid5 with 3 sata disks using md and drbd... on top of  
lvm etc.

For Example,

quote
protocol prot-id
On the TCP/IP link the specified protocol is used. Valid protocol  
specifiers are A, B, and C.

Protocol A: write IO is reported as completed, if it has reached local  
disk and local TCP send buffer.

Protocol B: write IO is reported as completed, if it has reached local  
disk and remote buffer cache.

Protocol C: write IO is reported as completed, if it has reached both  
local and remote disk.
/quote

 From personal experience, I have found that people usually use  
Protocol C... it's great however it can result in slower writes...  
which depending on your hardware can be very painful.  The fact that  
you have LVM2 sitting in there, as well as MD that means your average  
write has to go through DRBD (and wrote to both servers) ... as well  
as LVM2, and MD before it's actually written... (in a very vague sense)

Additionally you can use LVM2 Striping I really won't get into that  
but that may be more beneficial then a RAID-5 with 3 Disks.

There's lots of hints if you read over the archives for speed, I can  
just tell you from what I have read there is nothing you can do with  
your complex setup to make it better.


My best hint for you would be hardware raid, for one that's a big  
step, if you really want raid-5, it may be more beneficial to use 4  
SATA disks... You can and will expect no matter how good your hardware  
is (read slow writes) with RAID-5 and MD.  I had a Zimbra mailserver  
with RAID-5 and the best write I could get was 75Mbit, and that was  
using 8 15k RPM SCSI disks... :(

Hardware Raid, remove LVM unless you really need it... remove DRBD  
unless you totally need it there is other ways to create  
redundancy that are better then DRBD... It's not that I hate DRBD... I  
just hate seeing it implemented in places where it just does not  
belong

I don't know if this will make sense, if it doesn't let me know and  
I'll break it down further if you need it.

Lastly, if you could show us some of your syslog to see if there is  
actually any warnings about '440 lockers in use' or such?

Scott

On Feb 28, 2008, at 2:54 PM, Michael Bacon wrote:

 Jeff,

 Just as a rule of thumb, if you've got problems with Cyrus (or any  
 mail
 system), 90% of the time they're related to I/O performance.

 I've never seen drbd used for Cyrus, but it looks like other folks  
 have
 done it.  The combination of drbd+lvm2+ext3 might put you somewhere
 unpleasant, but I'll have to let the Linux-heads jump in on that one.

 Beyond that, I don't see anything obviously wrong, but maybe someone  
 who's
 run it more on Linux can chime in.

 -Michael

 --On Thursday, February 28, 2008 3:36 PM -0700 Jeff Fookson
 [EMAIL PROTECTED] wrote:

 Michael Bacon wrote:

 What database format are you using for the mailboxes database?  What
 kind of storage is the metapartition (usually /var/imap) on?  What
 kind of storage are your mail partitions on?

 Databases are all skiplist. Our mail partition and the  
 metapartition are
 both on the same filesystem, as we intended that both be part of  
 the same
 drbd mirror. That partition is
 a linux software RAID 5 (3 SATA disks). On top of the md layer is the
 drbd device; on top of that is an lvm2 logical volume; on top of  
 that is
 an ext3 filesystem, mounted
 as '/var/imap'. The mail is then in /var/imap/mail and the metadata  
 in
 /var/imap/config (and we also have /var/imap/certs for the ssl  
 stuff, and
 /var/imap/sieve for sieve scripts).

 Thanks.

 Jeff Fookson



 --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
 [EMAIL PROTECTED] wrote:

 Folks-

 I am hoping to get some help and guidance as to why our  
 installation of
 cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory  
 supporting a
 user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of  
 'lmtpds' (from
 about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in  
 the 2-15
 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during  
 the day
 when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the  
 imapds
 are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace'  
 attached to
 an instance of lmtpd or imapd shows
 waits 

Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Zachariah Mully
On Thu, 2008-02-28 at 16:56 -0600, Kenneth Marshall wrote:
 It may be that the software RAID 5 is your problem. Without the
 use of NVRAM for a cache, all of the writes need all 3 disks.
 That will cause quite a bottle-neck.
 
 Ken

And if you can, try to get the mailstore over onto a RAID1. RAID5 is
only good for long rebuilds and slow writes. Since you've already got
DRBD setup, can you gank something to add as a second DRBD replica,
which you can either test a single disk setup, or a RAID1 setup?

Z


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Zachariah Mully
On Thu, 2008-02-28 at 18:36 -0500, Zachariah Mully wrote:
 On Thu, 2008-02-28 at 16:56 -0600, Kenneth Marshall wrote:
  It may be that the software RAID 5 is your problem. Without the
  use of NVRAM for a cache, all of the writes need all 3 disks.
  That will cause quite a bottle-neck.
  
  Ken
 
 And if you can, try to get the mailstore over onto a RAID1. RAID5 is
 only good for long rebuilds and slow writes. Since you've already got
 DRBD setup, can you gank something to add as a second DRBD replica,
 which you can either test a single disk setup, or a RAID1 setup?

Err... I thought you could setup drbd in a multiple replica config, but
I was mistaken...

Z


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Vincent Fox
Gah my first thought was, a 3-disk RAID5?

Is this 1998 or 2008?  Disk is cheap.  RAID-1 or RAID-10.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-28 Thread Pascal Gienger
Jeff Fookson [EMAIL PROTECTED] wrote:

 Databases are all skiplist.

As a rule of thumb, do not use skiplist for the duplicate delivery 
suppression database (deliver.db). Even if everybody hates it, use 
BerkeleyDB, Version 4.4.52 or higher. Give it a quite fair amount of shared 
memory. And run cyr_expunge often to prune that database so that no entry 
is older than - say - 3 days.

We have approx 10-15 messages/sec incoming on one node.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html