Re: choosing a file system
Yeah, except Postfix encodes the inode of the queue files in its queue IDs, so it gets very confused if you do this. Same with restoring queues from backups. You should be able to get away with this if, when moving the queue to another machine, you move the queued mail from hold, incoming, active and deferred directories into the maildrop directory on the target instance. This (somewhat old, but still correct, I think) message from Wietse might shed more light on it: Date: Thu, 12 Sep 2002 20:33:08 -0400 (EDT) From: wie...@porcupine.org (Wietse Venema) Subject: Re: postfix migration I want to migrate postfix to another machine. What are also the steps so that I won't lose mails on the process? This is the safe procedure. 1) On the old machine, stop Postfix. 2) On the old machine, run as super-user: postsuper -r ALL This moves all queue files to the maildrop queue. 3) On the old machine, back up /var/spool/postfix/maildrop 4) On the new machine, make sure Postfix works. 5) On the new machine, stop Postfix. 6) On the new machine, restore /var/spool/postfix/maildrop 7) On the new machine, start Postfix. There are ways to skip the postsuper -r ALL step, and copy the incoming + active + deferred + bounce + defer + flush + hold directories to the new machine, but that would be safe only with an empty queue on the new machine. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be lockingissues
it takes long enough to rebuild an array with large drives that the chances of a second drive failing during the rebuild become noticable. Worse, the act of rebuilding can prompt a second, marginal disk to fail. Presumably the mechanics are the head runs through a patch of debris in an otherwise rarely accessed area of the disk, or the increased load causes thermal problems. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: LARGE single-system Cyrus installs?
Note that ext3 effectively does the same thing as ZFS on fsync() - because the journal layer is block based and does no know which block belongs to which file, the entire journal must be applied to the filesystem to achieve the expected fsync() symantics (at least, with data=ordered, it does). Well, does not know which block belongs to which file sounds weird. :) With data=ordered, the journal holds only metadata. If you fsync() a file, ordered means that ext3 syncs the data blocks first (with no overhead, just like any other filesystem, of course it knows what blocks to write), then the journal. Now, yes, the journal possibly contains metadata updates for other files too, and the ordered semantics requires the data blocks of those files to be synced as well, before the journal sync. I'm not sure if a fsync() flushes the whole journal or just up to the point it's necessary (that is, up to the last update on the file you're fsync()ing). The ext3 journalling layer only knows about blocks. When using data=ordered, only metadata *blocks* are tracked by the journalling layer. The journalling layer does not know which data blocks correspond to which metadata block, so everything is forced out. Most other journalling file systems operate within the filesystem abstraction, and journal atomic filesystem operations, which leaves them better able to implemented sane fsync() symantics. data=writeback is what some (most) other journalled filesystems do. Metadata updates are allowed to hit the disk _before_ data updates. So, on fsync(), the FS writes all data blocks (still required by fsync() semantics), then the journal (or part of it), but if updates of other files metadata are included in the journal sync, there's not need to write the corresponding data blocks. They'll be written later, and they'll hit the disk _after_ the metadata changes. This is possible because those other journals operate at the filesystem, not block level. If power fails in between, you can have a file whose size/time is updated, but contents not. That's the problem with data=writeback, but it should be noted that's pretty normal for other journalled filesystems, too. It applies only to files that were not fsync()'ed. And, in this case, you're no worse off than you would have been with a traditional filesystem such as UFS. I think that if you're running into performance problems, and your system is doing a lot of fsync(), data=orderer is the worst option. You're assuming fsync() behaviour changes with the other data= options - have you looked into it? I'm wary because the ext3 guys have a long history of simply not getting what fsync() is for, what it's supposed to do, and why it's important. I recently asked Andrew Morton whether fsync() behaviour changed with data= options, but he couldn't remember, and I haven't had time to look into it myself. data=journal is fsync()-friendly in one sense, it does write *everything* out, but in one nice sequential (thus extremely fast) shot. Later, data blocks will be written again to the right places. It doubles the I/O bandwith requirements, but if you have a lot of bandwidth, it may be a win. We're talking sequential write bandwidth, which is hardly a problem. This is true, right up until the point you fill the journal... 8-) data=writeback is fsync() friendly in the sense that it writes only the data blocks of the fsync()'ed file plus (all) metadata. It's the lowest overhead option. If you have a heavy sustained write traffic _and_ lots of fsync()'s, then data=writeback may be the only option. I think some people are scared by data=writeback, but they don't realize it's just what other journalled FS do. I'm not familiar with ReiserFS, it think it's metadata-only as well. Certainly data journalling is the exception, rather than the rule. Off the top of my head, I can't think of another mainstream filesystem that does it (aside from the various log-structured filesystems such as Waffle and Reiser4). data=ordered is good, for general purpose systems. For any application that uses fsync(), it's useless overhead. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: LARGE single-system Cyrus installs?
Certainly data journalling is the exception, rather than the rule. Off the top of my head, I can't think of another mainstream filesystem that does it (aside from the various log-structured filesystems such as Waffle and Reiser4). AFAIK you get it with UFS + gjournal, dunno if that counts as main stream though :) Gjournal sounds like it's block level, like ext3, so would suffer the same sorts of shortcomings in this application. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: LARGE single-system Cyrus installs?
In production releases of ZFS fsync() essentially triggers sync() (fixed in Solaris Next). [...] Skiplist requires two fsync calls per transaction (single untransactioned actions are also one transaction), and it also locks the entire file for the duration of said transaction, so you can't have two writes happening at once. I haven't built Cyrus on our Solaris box, so I don't know if it uses fcntl there, it certainly does on the Linux systems, but it can fall back to flock if fcntl isn't available. Note that ext3 effectively does the same thing as ZFS on fsync() - because the journal layer is block based and does no know which block belongs to which file, the entire journal must be applied to the filesystem to achieve the expected fsync() symantics (at least, with data=ordered, it does). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: mail coming without MX; how ?
if you check the domain infoservices.in with dnsstuff.com you can see no MX for that domain. But still mail is coming at [EMAIL PROTECTED]we are using it for our official purposes and infoservices.in is our official site too. I wounder how mail is still coming with out MX ? could any one kindly explain ? Most Mail Transport Agents will fall back to the A record if no MX records are found. This precident was set by sendmail, and woe betide any implementation that ignores precident, but it would be foolish to count on all MTAs behaving this way. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: mail coming without MX; how ?
Most Mail Transport Agents will fall back to the A record if no MX records are found. This precident was set by sendmail, and woe betide any implementation that ignores precident, but it would be foolish to count on all MTAs behaving this way. With SMTP to you _can_ count on this behaviour. Quoting RFC 2821: 5. Address Resolution and Mail Handling [...] The lookup first attempts to locate an MX record associated with the name. [...] If no MX records are found, but an A RR is found, the A RR is treated as if it was associated with an implicit MX RR, with a preference of 0, pointing to that host. RFC 2821 is relatively new (certainly newer than most MTAs), and while the popular ones have made some effort to comply with it, many others still struggle to comply with RFC 821. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. Sorry, my confusion. But it might be worth asking the reiserfs guys. My experience has been that if you are fsync'ing files, then even modern disks only get around 10 fsync's per second (because not only does the file data get writen out, but typically the inode, the directory entry, the free block table and maybe even all the directory entries up to root). Journalling can help, because the commited data is writen sequentially to the journal, rather than being scattered all over the disk, but the journalled operations still need to be applied to the filesystem sooner or later. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
I forwarded John's message to Andrew Morton, linux kernel maintainer, and this is his reply (it was cc'ed to the list, but, not being a subscriber, I presume it bounced): --- Forwarded Message Date:Tue, 08 Nov 2005 15:21:31 -0800 From:Andrew Morton [EMAIL PROTECTED] To: Andrew McNamara [EMAIL PROTECTED] cc: John Madden [EMAIL PROTECTED], info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance (fwd) Andrew McNamara [EMAIL PROTECTED] wrote: This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. ... Yes, on ext3, an fsync() syncs the entire filesystem. It has to, because all the metadata for each file is shared - it's just a string of journallable blocks. Similar story with the data, in ordered mode. So effectively, fsync()ing five files one time each is performing 25 fsync()s. One fix (which makes the application specific to ext3 in ordered-data or journalled-data mode) is to perform a single fsync(), with the understanding that this has the side-effect of fsyncing all the other files. That's an ugly solution and is rather hard to do if the workload consists of five separate processes! So I'd recommending mounting the filesystem with the `-o data=writeback' mode. This way, each fsync(fd) will sync fd's data only. That's much better than the default data-ordered mode, wherein a single fsync() will sync all the other file's data too. In data=writeback mode it is still the case that fsync(fd) will sync the other file's metadata, but that's a single linear write to the journal and the additional cost should be low. Bottom line: please try data=writeback, let me know. --- Forwarded Message Date:Tue, 08 Nov 2005 09:25:54 -0500 From:John Madden [EMAIL PROTECTED] To: Jure =?iso-8859-1?Q?Pe=E8ar?= [EMAIL PROTECTED] cc: info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance As expected, these are from locking operations. 0x8 is file descriptor, which, if I read lsof output correctly, points to config/socket/imap-0.lock (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait for it to be released in the manual page. Yup, that's exactly the sort of thing I was suspecting -- the performance I was seeing just didn't make sense. imap-0.lock is in /var/imap/socket for me. I believe it's one of the lock file s created when cyrus is started, so it wouldn't make any sense for imapd to ever be spinning on it. The delays I was seeing ocurred when multiple imapd's were writing to the spool at the same time. I do see a lot of this though: fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 It looks like the lock to open a file in the target mailbox. But again, very l ow actual throughput and still little or no iowait. However, adding a -c to the strace, the top three syscalls are: % time seconds usecs/call callserrors syscall - -- --- --- - - 52.680.5147201243 414 fdatasync 29.870.291830 846 345 fsync 4.190.040898 27 1519 fcntl Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] - Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html --- End of Forwarded Message --- End of Forwarded Message Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus, NFS and mail spools
Ken Murchison wrote: As far as I'm concerned, NFS still is not an option for Cyrus for all of the reasons that have been outlined in the past. Cyrus 2.3 *might* work with NFS, but I'm not making any guarantees. For what it's worth, we've been running Cyrus 2.1 in production on NFS for about a year now. Approximately six Cyrus instances running under Solaris share a high-availability NetApp filler, shifting about 1TB of mail per week without problem. We had to make a few small modifications to Cyrus. I think these have all been discussed on the list at some time - things like not holding files open across rmdir calls. I would suggest the specific combination of NFS client and NFS server was important - I doubt any other combination would have been as successful. One important detail - we are using local locking (undocumented NFS mount option llock). When network locking is enabled (default), the Solaris NFS client disables all client-side caching of locked files, which results in excessive I/O rates. Using llock allows client-side caching of locked files, but makes it absolutely critical that only one Cyrus instance accesses a given volume at any time, and we go to great lengths to ensure this is the case. I'm not sure we would make the same choice again, but when project was initiated SANs were not mature enough, and we had extensive experience in running the Solaris/NetApp combination in demanding applications (among other things, a very busy multi-terabyte Oracle instance). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: v2.1.14 freeze from LIST %
Are you sure this is actually going across the wire? This is a new one. Yes, just tested over a terminal connection: * OK IMAP4 Ready *** 0001dcc5 0 LOGIN 0 OK You are so in 1 LIST % This is not a cyrus imap server (we don't return 'You are so in' in response to a login command). Perhaps you should contact your IMAP server vendor. It's the Perdition IMAP proxy, if I remember correctly. After the successful login response, he should be talking directly to whatever is behind Perdition (presumably cyrus). The original correspondent should try talking directly to his IMAP server, if that's possible, to rule Perdition out of the equation. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Open file limits
Can anyone share experiences with running out of open files on Linux? I am using a 2.4.26 kernel, and the system wide open file limit is rather large. Do I need to set anything other than this? The default limit of 1024 is in effect for both cyrus and root. Off the top of my head, there are four main areas where you run into trouble (only three relevent to linux): 1. total open fd's in the system 2. user fd limits (ulimit - typically 1024) 3. the select() call (typically 1024) 4. old stdio implementations (256) The first two you probably know about, although you may not know about the third and fouth. The select() function usually has a limit of 1024 file descriptors - this is because it uses an implementation-defined bitmap to signal interest and status of each file descriptor. The FD_SETSIZE constant (defined in sys/types.h) tells you the size of the bitmap. The fourth will bite you on what I'll rudely call legacy unix systems, eg Solaris. I haven't checked versions after Solaris 8, but the fd field in the stdio structure was traditionally an unsigned char value, and in the bad old days, apps would mess around inside this structure. Presumably because they have customers with grungy old apps, Sun has retained this historical anacronism. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ --- Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Cyrus imapd dying on me
(gdb) bt #0 0x805fc02 in index_fetchreply () Cannot access memory at address 0xbfbfdbd0. The stack is corrupt - while not conclusive, I'd considering a hardware problem (power supply drooping, ram flakey, overheating, etc). Have a look at: http://www.bitwizard.nl/sig11/ -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Migrating mailboxes between Cyrus instances?
I've been googling around without much luck - I'm looking for suggestions on migrating users between cyrus instances (for load balancing). I'm thinking it should really be done at the IMAP level, rather than dicking around inside the message store, but I'm worried there might be message attributes than can't be readily copied via the imap protocol. Anyone have any advice? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: flock vs fnctl
Thanks for the info... I switched over to using flock() and I can confirm that it is now being used instead of fnctl(). The problem is that I still see the same problem as before with regards to over 16500 instances of the following: stat(/var/imap/mailboxes.db, 0x00011FFF9C98) = 0 flock(6, LOCK_UN) = 0 flock(6, LOCK_SH) = 0 fstat(6, 0x00011FFF9D38)= 0 At least it uses flock() now :-) It is interesting to see that this only occurs after all the mkdir/copy/unlink operations have been completed. I don't know what it is trying to do, but it is quite painful... It adds at least another minute to the operation of the IMAP RENAME command after everything has been renamed! This sounds like the same problem I complained about on the list in the thread, subject Very slow deletion of user mailboxes?, posted 9th July. I haven't had a chance to investigate further. http://asg.web.cmu.edu/archive/message.php?mailbox=archive.info-cyrusmsg=23597 -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Very slow deletion of user mailboxes?
/test9772__foo_net_au) Err#17 EEXIST 18679: close(9)= 0 18679: munmap(0xFEFF, 203) = 0 18679: close(10) = 0 18679: munmap(0xFED7, 16384) = 0 18679: close(11) = 0 18679: munmap(0xFED6, 16384) = 0 18679: close(13) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEEA48) = 0 18679: fstat(5, 0xFFBEEB70)= 0 18679: stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEAE8) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEEB60) = 0 18679: unlink(/mail/vmb1/var/user/A/test9772__foo_net_au.seen) Err#2 ENOENT 18679: fstat(8, 0xFFBEEBE8)= 0 18679: time() = 1057743572 18679: getpid()= 18679 [7702] 18679: putmsg(8, 0xFFBEE2A0, 0xFFBEE294, 0)= 0 18679: open(/var/run/syslog_door, O_RDONLY) = 9 18679: door_info(9, 0xFFBEE1D8)= 0 18679: getpid()= 18679 [7702] 18679: door_call(9, 0xFFBEE1C0)= 0 18679: close(9)= 0 18679: unlink(/mail/vmb1/var/user/A/test9772__foo_net_au.sub) Err#2 ENOENT 18679: open64(/mail/vmb1/var/quota/A/, O_RDONLY|O_NDELAY) = 9 18679: fcntl(9, F_SETFD, 0x0001) = 0 18679: fstat64(9, 0xFFBEEBB8) = 0 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 928 18679: unlink(/mail/vmb1/var/quota/A/user.test9772__foo_net_au) = 0 18679: getdents64(9, 0x0014A288, 1048) = 1016 18679: getdents64(9, 0x0014A288, 1048) = 976 18679: getdents64(9, 0x0014A288, 1048) = 976 18679: getdents64(9, 0x0014A288, 1048) = 968 18679: getdents64(9, 0x0014A288, 1048) = 976 18679: getdents64(9, 0x0014A288, 1048) = 992 18679: getdents64(9, 0x0014A288, 1048) = 984 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 992 18679: getdents64(9, 0x0014A288, 1048) = 984 18679: getdents64(9, 0x0014A288, 1048) = 952 18679: getdents64(9, 0x0014A288, 1048) = 968 18679: getdents64(9, 0x0014A288, 1048) = 976 18679: getdents64(9, 0x0014A288, 1048) = 984 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 968 18679: getdents64(9, 0x0014A288, 1048) = 1008 18679: getdents64(9, 0x0014A288, 1048) = 984 18679: getdents64(9, 0x0014A288, 1048) = 952 18679: getdents64(9, 0x0014A288, 1048) = 960 18679: getdents64(9, 0x0014A288, 1048) = 992 18679: getdents64(9, 0x0014A288, 1048) = 984 18679: getdents64(9, 0x0014A288, 1048) = 1000 18679: getdents64(9, 0x0014A288, 1048) = 968 18679: getdents64(9, 0x0014A288, 1048) = 376 18679: getdents64(9, 0x0014A288, 1048) = 0 18679: close(9)= 0 18679: fcntl(5, F_SETLKW, 0xFFBEE9C8) = 0 18679: fstat(5, 0xFFBEEAF0)= 0 18679: stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEEAE0) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEE9C8) = 0 18679: fstat(5, 0xFFBEEAF0)= 0 18679: stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0 [...last 4 lines repeated about 10k times, then...] 18679: fcntl(5, F_SETLKW, 0xFFBEEAE0) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEE9C8) = 0 18679: fstat(5, 0xFFBEEAF0)= 0 18679: stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0 18679: fcntl(5, F_SETLKW, 0xFFBEEAE0) = 0 18679: open64(/mail/vmb1/var/sieve/t/test9772__foo_net_au, O_RDONLY|O_NDELAY) Err#2 ENOENT 18679: poll(0xFFB8, 1, 0) = 0 18679: write(1, 2 O K C o m p l e t.., 16) = 16 18679: time() = 1057743623 18679: poll(0xFFB8, 1, 180)(sleeping...) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Very slow deletion of user mailboxes?
We're having a problem where deletion of a user by the admin is taking about 30 seconds. I've attached an abreviated truss of the imapd while this is taking place. It shows a long string of: A more detailed truss: fstat(5, 0xFFBEEAF0)= 0 stat(/mail/vmb1/var/mailboxes.db, 0xFFBEEA68) = 0 fcntl(5, F_SETLKW, 0xFFBEEAE0) = 0 typ=F_UNLCK whence=SEEK_SET start=0 len=0 sys=0 pid=68157445 fcntl(5, F_SETLKW, 0xFFBEE9C8) = 0 typ=F_RDLCK whence=SEEK_SET start=0 len=0 sys=33152 pid=0 All the locks have start=0, len=0. mailboxes.db is a bit under 2MB, and contains around 10k entries (as counted by ctl_mboxlist -d | wc -l). I wonder if the fact that there are 10k entries in mailboxes.db, and 10k iterations through the above loop is a coincidence? It would suggest something pathological has happened to our skiplist, and it's devolving to a linear search? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: ctl_cyrusdb -r performance over NFS
A SAN is just a different transport mechanism between the host and the drives -- the protocol is the same old SCSI that has been around for years. That said, there is less interoperability than one would like at this point. The basic problem with NFS is that either you violate standard Unix filesystem semantics or you have pitiful performance (by disabling client-side caching). Add to that the idea that locking was an after-thought (the design goal of a stateless filesystem doesn't exactly fit with maintaining locks) and it is really a mess. As long as you only have one machine writing to the data, you don't have to worry so much about broken filesystem semantics (which is why your Oracle instance works), but you still have lousy performance. You assume I don't already know this... 8-) BTW, I don't think it's violate unix semantics OR pitiful perfomance - the protocol is flawed in ways that make that violate unix semantics AND pitiful perfomance. In particular, lost, out of order or replayed requests are not fully addressed by the stateless design. For what it's worth, we go to extraordinary lengths to ensure only one host hits a given NFS volume at a time, we spend silly amounts of money to keep the latency down and the bandwidth up, and we use the best quality NFS implementations we can. I'm not convined that SAN (where storage is a euphemism for disk) is really the answer to anything. Network attached storage (where storage is a euphemism for file server) is a far more convenient model. We just need a better protocol. If only Plan9 had gained a critical mass... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8
I upgraded a previously working postfix from cyrus-sasl-2.1.7 to cyrus-sasl-2.1.12, and while SASL works fine for smtpd, the lmtp process dumps core within the SASL libraries. The resulting core isn't really useful because I'm not getting any symbols resolved from the sasl lib (dynamlic linking?): #0 0xff36ff8c in ?? () from /opt/sasl/lib/libsasl2.so.2 #1 0xff36ff88 in ?? () from /opt/sasl/lib/libsasl2.so.2 #2 0xfef60fe0 in ?? () from /opt/sasl/plugins/liblogin.so.2 #3 0xff366438 in ?? () from /opt/sasl/lib/libsasl2.so.2 #4 0x1996c in lmtp_sasl_authenticate (state=0x7f9c0, why=0x84ac8) at lmtp_sasl_glue.c:499 #5 0x19cb8 in lmtp_sasl_helo_login (state=0x7f9c0) at lmtp_sasl_proto.c:118 #6 0x1770c in lmtp_lhlo (state=0x7f9c0) at lmtp_proto.c:249 #7 0x16bf4 in deliver_message (request=0x82d08, unused_argv=0xffbefee0) at lmtp.c:381 #8 0x16d28 in lmtp_service (client_stream=0x81c50, unused_service=0xffbeff74 tvmb1, argv=0xffbefee0) at lmtp.c:453 #9 0x19e9c in single_server_wakeup (fd=531536) at single_server.c:250 #10 0x1a00c in single_server_accept_local (unused_event=1, context=0x9 ) at single_server.c:292 #11 0x26ef4 in event_loop (delay=299008) at events.c:586 #12 0x1a8d0 in single_server_main (argc=7, argv=0xffbefec4, service=0x16cf0 lmtp_service) at single_server.c:639 #13 0x16f14 in main (argc=7, argv=0xffbefec4) at lmtp.c:542 Any pointers gratefully received (particularly suggestions on how to get gdb to look at the libsasl2.so.2 symbols)? -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8
I upgraded a previously working postfix from cyrus-sasl-2.1.7 to cyrus-sasl-2.1.12, and while SASL works fine for smtpd, the lmtp process dumps core within the SASL libraries. The resulting core isn't really useful because I'm not getting any symbols resolved from the sasl lib.. Sigh - SASL was picking up the plugins from the 2.1.7 build. Once the install tree was cleaned and the 2.1.12 plugins put in the right place, it's now working like a bought one. The missing symbols in the sasl lib were being caused by a prehistoric gdb build. A new build of gdb got three fatal internal errors while starting, but gave me enough information to suspect the API had changed. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Core from lmtp, postfix 2.0.3 and Cyrus-SASL 2.1.12, Solaris 8
The missing symbols in the sasl lib were being caused by a prehistoric gdb build. A new build of gdb got three fatal internal errors while starting, but gave me enough information to suspect the API had changed. There shouldn't have been an API change from 2.1.7 to 2.1.12, what functions had the change? Note that I'm talking about the internal API between the plugins and the sasl lib. I didn't look closely at the code - when I began to suspect it was my problem, I checked the date stamps on the plugins and went no further investigating the core. I've rebuilt various bits, so the core is no longer valid, but from memory, the last three frames were a call from libsasl2 into liblogin (plugin) and a call back into libsasl2 by the plugin. GDB couldn't resolve the liblogin symbols for some reason (maybe it couldn't find the object file), but the parameters made sense on their way in, but not on their way out. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Cyrus IMAPd 2.1.10 Released
I believe a set of plaintext documentation can be maintained with RCS, CVS or SCCS without problems by a distanced dev team, while XSLT will require proper usage by the author manuals etc... Yep. The reality is it's not us who chose the doc tools, but the people who actually update the doco. Volunteering your favourite documentation tools isn't nearly as valuable as volunteering your time to help keep the documentation up to date... 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Updating /seen from concurrent sessions
I don't use OE but I experienced the same (or similar) problem with mozilla: since it uses many concurrent connections to the server, seen messages came back as unseen various times, and it was very annoying. Switching to skiplist almost solves the problem (at least it did for me): since I switched to skiplist I had seen messages come back as unseen only 2 or 3 times. I suspect there is a bug in the flat-file seen implementation. Each process opens the seen file and holds this file descriptor open. Then one process wants to update the file. It does this by writing a new file, and renaming it into place. But all the other processes still have the now unlinked and out of date copy open. With skiplist, this problem no longer occurs (the skiplist database makes changes made by other processes visible immediately). However, another problem remains: updates are defered for performance reasons. So one session will update the seen list, and the other processes will not see the change (unless they and the updating process execute certain commands, such as NOOP). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Updating /seen from concurrent sessions
Try using skiplist for the seen.db It doesn't really solve the problem but it masks it well enough. From my understanding, changing to skiplist really shouldn't change the visible behavior at all. But I've been wrong before. I'll try to test it here and let you know. My reading of the code suggests it shouldn't change the specific problem I'm seeing. What's the general feeling on the skiplist implementation used in conjunction with Sun and NetApp's NFS (we're locked in to using this combination for various reasons)? Would you be more or less likely to trust it over db3? Another question - it looks to me like I have to recompile to switch database types - is this true? The code looks like it would be flexible enough to allow a run-time config option to chose the method with very little modification? It would be possible to flush the seen state more often; it's just a question of how often and when should other imapds look for it. If the imapd already can cope with asynchronous events, I would flush the state after a second or two of inactivity from the client. Failing that, I would probably flush the state before replying to the client (yes, this would hurt performance, although probably not much, particularly if we skip the fsync()). But this just fixes the OE problem - Cyrus would still have a problem (as far as I can see): all the other copies accessing that mailbox will still have their old seen files open (maybe using skiplist fixes this). The flat-file seen implementation needs to check to see if the file has been renamed under it (and do what?). To be honest, the flat file seen implementation is way more complicated than I would have thought was worthwhile. My preference would be to not hold the file open, and simply re-write the whole file each time we updated it, renaming the replacement into place (to make the operation atomic - this is also the only synchronous operation). My experience has been that unix is quite happy doing naive things like this while the file remains small (say less than 10k). I implemented a Postfix map that works this way - for lookups, it simply does a linear read/search of the file. For update, it writes a new file, and moves it into place. Generally this performed much better than more complex schemes such as the Sleepycat DB's - particularly when you consider memory footprint (this was on a machine with about 100k users, handling 10's of messages per second). I've never actually seen this problem happen whenever I've fooled around with OE so I've never looked at the code to figure out what to do. I get the impression it's a specific OE usage pattern that triggers it. I've had it described to me as send a mail, click the send/check button, which sounds common enough to me. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Cyrus IMAPd 2.1.10 Released
I feel that moving back to only plaintext is a step backwards. I don't know much about SGML myself, so I'm not sure I'd want to be stuck maintaining that, but it sounds interesting enough (and it would be nice to have general tools for keeping the documentation formatted, instead of worrying when htmlstrip would next break). You could do worse than look at the Python documentation. The production doco is current LaTeX with a bunch of custom macros. HTML, PDF, etc are generated off the master LaTex markup. There is a background project to use SGML (I think), but it's not there yet. Our company (not me personally) looked at doco tools a while back and came to the conclusion that LaTeX was still the best choice out of a bad lot - SGML was the next closest, although the tools were still rather imature. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Updating /seen from concurrent sessions
of stat() under Solaris - so rather than keep a file open, and stat it periodically to see if it's changed under you, you can close and reopen the file (resulting in simpler code, but similar performance). However, updates can be an order of magnitude more frequent if we're going to write for every flag change. Cyrus is written with the expectation that you will have thousands of simultaneous clients working on tens or hundreds of thousands of mailboxes. And an excellent design goal that is... 8-) I'm guessing, but I suspect OE updates the \Seen flag each time it downloads a message, and presumably this occurs each time a user selects a message. So you may only see an update every couple of seconds from each client - obviously that adds up. BTW, there may be paid consulting oportunities for people with demonstratable advanced Cyrus hacking skills in this project. If anyone is interested, let me know. I implemented a Postfix map that works this way - for lookups, it simply does a linear read/search of the file. For update, it writes a new file, and moves it into place. Generally this performed much better than more complex schemes such as the Sleepycat DB's - particularly when you consider memory footprint (this was on a machine with about 100k users, handling 10's of messages per second). It doesn't scale when there are frequent updates. That's why we have the database abstraction, so we can choose the file format that does the job most effectively. cyrusdb_flat does exactly this, and it works ok when you don't need frequent updates. Seen state has frequent updates. Actually, it scaled better than initially expected - this map type was used specifically for tables that changed very frequently (the pop-before-smtp pre-auth mechanism being a case in point). The only synchronous operation was the rename(). The lookup read()'s would have been pulling the data from the buffer cache, and sequential searches beat more complex schemes every time when the dataset is small (less than 100kB was the figure we found when comparing to things like libdb). The saving in resident set size was critical too - the machine had 4G of RAM, and no more could be fitted. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Updating /seen from concurrent sessions
A lot of problems also result when people try to run the application on more than one computer hitting the same NFS server. But things that drive us application writers mad is the idea that rename() can return failure but have actually happened; and if you're trying to write a reliable application, you don't want to rely on the fact that the chance of this is minimized, since you know it's going to happen and you're going to be sorry. That's certainly the NFS flaw that comes to mind. I happen to agree with you that it's not enough to simply minimise the chances of something untoward happening. I would hope it would work with a single server with multiple processes. But I really haven't thought about all the possibilities with NFS. (The return error and succeed problem is just one that springs to mind, and I've never audited the code thinking about that.) Okay. Your comments are valued. Great, now I need to do bookkeeping to do this. Plus on most Unix filesystems, rename() is a more expensive operation than 1 fsync() and probably even 2 fsync()s. And how am I suppose to programmatically determine whether or not a given version is valid? Mmm. It was a half-baked idea that came from the observation that the flat-file \Seen code was doing renames() anyway. Linux ext2 has this metadata problem. ext3 and reiserfs are both suppose to force metadata to disk when fsync() is called, similiar to how softupdates on BSD, Veritas, or most other modern filesystems. I'm willing to bet that I've wasted more time than you have worrying about the semantics of fsync() on various Unix filesystems. Quite possibly. I've certainly wasted enough time on them over the years. It's hard to prove what a given O/S is doing is correct, even when you have inside knowledge. You need to do the stat() regardless if you want the latest data. By keeping the file open, you potentially amortize the cost of an open(), another fstat (find out the file descriptor of your open'd fd) and an mmap(). All of these have various different costs depending on your platform and your Unix. Mmap is the killer - it often involves a lot of expensive setup within the kernel. I'd tend to think that if you were using mmap() for read access to the file, it probably should be modified in place, rather than renamed. The flat-file \Seen implementation both mmap()'s and renames() and this looks to me like the source of it's pain. But then you need some sort of cheap synchronization scheme. BTW, have you looked at Andrew Tridgell's Trivial Database? It uses mmaped files and spin-locks to achieve good write performance, although I don't think resilience in the face of crashes was a high priority. However the architecture-dependent spin lock code may be handy if you ever decide to follow this route. You have one database and weren't fsync()ing the data. Cyrus has thousands of active databases and cares about the reliability of the data. As it should. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Re: Updating /seen from concurrent sessions
BTW, have you looked at Andrew Tridgell's Trivial Database? It uses mmaped files and spin-locks to achieve good write performance, although I don't think resilience in the face of crashes was a high priority. However the architecture-dependent spin lock code may be handy if you ever decide to follow this route. I intended to include this URL: http://sourceforge.net/projects/tdb/ -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/
Updating /seen from concurrent sessions
Outlook Express users are complaining that their message \Seen status is lost. Snooping traffic, I see that OE is opening a second connection. In duplicating OE's behaviour by hand, I'm finding that I'm more confused than ever about Cyrus's behaviour. We're using Cyrus 2.1.9 on NetApp mounted disk. First up, a description of what OE is doing (in all it's stupidity) - when you read a message, it uses BODY.PEEK so as not to update the \Seen flag. It then apparently closes that connection, opens another, and sets the \Seen status with xxx UID STORE 49 +FLAGS.SILENT (\Seen). It then forgets that connection, and opens a new connection and starts using that for accessing the mailbox. At the Cyrus end, this is useless - the Cyrus that accepts the \Seen update, but holds off updating the .seen file (presumably as a performance optimisation). So the second session doesn't see it until OE eventually closes the first connection. Even sending a NOOP on the \Seen-updating thread would have been enough to trigger Cyrus into updating the .seen file. Cyrus probably needs to update the .seen file if no other activity occurs for a second or two. But - there appears to be a second problem: even if OE had sent a NOOP (or Cyrus decided to write the file), the second session doesn't see the update - it's still holding open an old .seen file, now unlinked (by the rename() that the \Seen-updating thread did). Cyrus needs to periodically stat() the .seen file and compare it's inode number to that of the file it holds open - if the differ, it needs to reopen the file). The RFC isn't entirely helpful on who's at fault here - section 5.5 talks about multiple commands being allowed, provided ambiguity doesn't result. In this case, provided OE waits for the STORE command to complete, I guess it's within it's rights to check the message status from another session. It's largely irrelevant anyway - OE is out there, and getting it fixed would be a hiding to nothing. I realise this is an old known problem, but I've spent some time searching list archives, and other sources looking for an answer. Any help anyone can provide will be gratefully received. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/