Re: *IMPORTANT* - bugfix sync_append_commit index breakage
On Sun, Sep 02, 2007 at 01:33:11PM +1000, Rob Mueller wrote: I believe so. If we also used it for appends, we wouldn't run into the problem that Fastmail is seeing. But, it may be too much of a performance hit. I don't think that any of the code does the .NEW for appends because that makes append O(N) where N is the mailbox size, and that would suck for huge mailboxes. Definitely. Rewriting the cyrus.index file for every message delivered to a mailbox would be horrible, especially when some users have 100,000+ messages in a mailbox. Speaking of which, is there any reason why cyrus.expunge isn't sorted by UID? We have to rewrite the entire cyrus.index each time we do an expunge anyway. I'm guessing it's because messages can be deleted + expunged in any order, so it's easy just to append the record for any message being expunged to the end of the cyrus.expunge file. If you had to keep it sorted, that would mean every time you expunged a message, you'd have to reread, resort, and rewrite the cyrus.expunge file. Generally the cyrus.expunge will be a lot smaller than the cyrus.index, but still annoying... Well, you are rereading (not sorting) the cyrus.index file every time you do an expunge anyway. A faster alternative would be to combine the two files again, have two different counts (exists, n_records) - where exists may be smaller. Also, you'd need to set aside a server flag for really, really expunged in each index record. There's the case of non-UID numbered fetch not working so well any more, but honestly, who sane uses that in this concurrent world? It's a way invasive patch though. Bron.
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
On Sat, Sep 01, 2007 at 11:48:51PM -0400, Ken Murchison wrote: Bron Gondwana wrote: Sorted by UUID would help enormously for things like not blatting the \Seen state for expunged messages until they're expired, because you could then just do a double-iterator over the two index format files. Seen state gets goofed up on unexpunge because we don't keep the original UID. I wanted to, but re-injecting a message in the middle of a mailbox with a UID UIDNEXT was either a violation of RFC3501 and/or seriously screwed with clients. Yeah, true. Actually, that's not the issue though. The issue is that seen state is stored as a range list, and the range list is rebuild by walking the cyrus.index file. If there's no record there, then it gets subsumed into the outer range, so seen state is actually lost before we try to unexpunge. An alternative (probably even dodgier for clients) is to just bump UIDVALIDITY. Tends to make you unpopular with people who have 2Gb in their mailbox though. Bron.
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
On Sat, 01 Sep 2007 13:08:00 -0400, Ken Murchison [EMAIL PROTECTED] said: David Carter wrote: On Sat, 1 Sep 2007, Ken Murchison wrote: I applied this patch to CVS, but I'm wondering why the original code writes directly to the existing cyrus.index instead of a cyrus.index.NEW, as is done in other places. I don't know if this was intentional by the previous authors, or an oversight. I'm going to investigate. Doesn't cyrus.index.NEW only kick in on expunge? I believe so. If we also used it for appends, we wouldn't run into the problem that Fastmail is seeing. But, it may be too much of a performance hit. Speaking of which, is there any reason why cyrus.expunge isn't sorted by UID? We have to rewrite the entire cyrus.index each time we do an expunge anyway. Sorted by UUID would help enormously for things like not blatting the \Seen state for expunged messages until they're expired, because you could then just do a double-iterator over the two index format files. I don't think that any of the code does the .NEW for appends because that makes append O(N) where N is the mailbox size, and that would suck for huge mailboxes. Bron. -- Bron Gondwana [EMAIL PROTECTED]
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
I believe so. If we also used it for appends, we wouldn't run into the problem that Fastmail is seeing. But, it may be too much of a performance hit. I don't think that any of the code does the .NEW for appends because that makes append O(N) where N is the mailbox size, and that would suck for huge mailboxes. Definitely. Rewriting the cyrus.index file for every message delivered to a mailbox would be horrible, especially when some users have 100,000+ messages in a mailbox. Speaking of which, is there any reason why cyrus.expunge isn't sorted by UID? We have to rewrite the entire cyrus.index each time we do an expunge anyway. I'm guessing it's because messages can be deleted + expunged in any order, so it's easy just to append the record for any message being expunged to the end of the cyrus.expunge file. If you had to keep it sorted, that would mean every time you expunged a message, you'd have to reread, resort, and rewrite the cyrus.expunge file. Generally the cyrus.expunge will be a lot smaller than the cyrus.index, but still annoying... Rob
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
I think your patch makes sense, but I'm not sure when The cyrus index file format has this clever ignore junk at the end until the exists count changes trick means. I know we leave junk in the cache file as a result of delayed expunge which gets cleaned up later, but I'm pretty sure the the index file is always tightly packed. The is extra space in the index header, but there shouldn't be any between the index records. Bron Gondwana wrote: We've had issues with broken indexes on the old master after a failover, and puzzled for ages over the cause. This is the reason behind the SIGQUIT patch which would mitigate this by reducing the unfinished business on a standard shutdown. But still, the symptoms were weird. I'm pretty sure I've figured out the cause, and the attached patch (compile tested only) fixes it... unfortunately it's a race condition so it's hard to test. Ken - it would be great if you can check this today and push it to CVS if you agree with me. It's a nasty one - especially since the symptoms are offset index records right until something that's not a sync_server append comes along and writes to the correct place in the index file again. On a more structural note - it would be great to re-factor the lower level APIs (mailbox.c, mboxlist.c, etc) to provide everything that sync_server needs manipulate their data structures and remove all the low level fiddling copy-and-pasted stuff from the sync layer. This comes back a bit to David's comments today about the difficulty of having patches flying around - I suspect he copied and pasted so he wasn't throwing tentacles deep through the main codebase - but the end result is more brittle software and poor abstraction. We need to avoid doing that if there's a bunch of us all hacking in our own little patch space! Bron. Bugfix sync index append After a failover, we sometimes get index files with duplicate records in them and other nastiness. The cyrus index file format has this clever ignore junk at the end until the exists count changes trick. sync_commit.c contains a copy and pasted index append logic, but it seeks to the end of the file rather than start_offset + exists * record_size. This is bogus and means that at some random future stage a real record will be overwritten and two of the old bogus record will still exist. Crap. This patch fixes that, obviously. Index: cyrus-imapd-2.3.9/imap/sync_commit.c === --- cyrus-imapd-2.3.9.orig/imap/sync_commit.c 2007-08-31 10:13:28.0 -0400 +++ cyrus-imapd-2.3.9/imap/sync_commit.c2007-08-31 10:26:03.0 -0400 @@ -481,6 +481,7 @@ unsigned long newflagged; char target[MAX_MAILBOX_PATH]; int n, r = 0; +long last_offset; struct txn *tid = NULL; modseq_t highestmodseq = 0; @@ -577,9 +578,11 @@ if (retry_writev(mailbox-cache_fd, cache_iovec, upload_list-count) 0) goto fail; -lseek(mailbox-index_fd, mailbox-index_size, SEEK_SET); + +last_offset = mailbox-start_offset + mailbox-exists * mailbox-record_size; +lseek(mailbox-index_fd, last_offset, SEEK_SET); if (retry_write(mailbox-index_fd, index_chunk, -upload_list-count * INDEX_RECORD_SIZE) 0) +upload_list-count * mailbox-record_size) 0) goto fail; -- Kenneth Murchison Systems Programmer Project Cyrus Developer/Maintainer Carnegie Mellon University
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
On Fri, 31 Aug 2007, Ken Murchison wrote: I think your patch makes sense, but I'm not sure when The cyrus index file format has this clever ignore junk at the end until the exists count changes trick means. I know we leave junk in the cache file as a result of delayed expunge which gets cleaned up later, but I'm pretty sure the the index file is always tightly packed. The is extra space in the index header, but there shouldn't be any between the index records. I think that the danger is that if sync_server gets shut down uncleanly (which I know was happening to Fastmail a lot at one point) then then you can end up with a bogus entry at the end of a cyrus.index file which is not overwritten by the next sync_append_commit() on that mailbox. The race condition is that the exists count in the header can only be updated after the index record has been written. An explicit seek using mailbox-exists is definitely more robust, although it probably doesn't help if power fails halfway through the fsync() on the cyrus.index file after both updates have been made (data=journal maybe?) -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH.
Re: *IMPORTANT* - bugfix sync_append_commit index breakage
On Sat, 1 Sep 2007, Bron Gondwana wrote: On a more structural note - it would be great to re-factor the lower level APIs (mailbox.c, mboxlist.c, etc) to provide everything that sync_server needs manipulate their data structures and remove all the low level fiddling copy-and-pasted stuff from the sync layer. This comes back a bit to David's comments today about the difficulty of having patches flying around - I suspect he copied and pasted so he wasn't throwing tentacles deep through the main codebase - but the end result is more brittle software and poor abstraction. Pretty much. I wasn't expecting the replication code to ever get merged when I first wrote it. All of the really awful stuff should be in sync_commit.c. Those are the routines which create messages with predetermined UIDs and mailboxes with predetermined UniqueIDs. -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH.