We've had issues with broken indexes on the old master after a failover, and puzzled for ages over the cause. This is the reason behind the SIGQUIT patch which would mitigate this by reducing the "unfinished business" on a standard shutdown.
But still, the symptoms were weird. I'm pretty sure I've figured out the cause, and the attached patch (compile tested only) fixes it... unfortunately it's a race condition so it's hard to test. Ken - it would be great if you can check this today and push it to CVS if you agree with me. It's a nasty one - especially since the symptoms are offset index records right until something that's not a sync_server append comes along and writes to the correct place in the index file again. On a more structural note - it would be great to re-factor the lower level APIs (mailbox.c, mboxlist.c, etc) to provide everything that sync_server needs manipulate their data structures and remove all the low level fiddling copy-and-pasted stuff from the sync layer. This comes back a bit to David's comments today about the difficulty of having patches flying around - I suspect he copied and pasted so he wasn't throwing tentacles deep through the main codebase - but the end result is more brittle software and poor abstraction. We need to avoid doing that if there's a bunch of us all hacking in our own little patch space! Bron.
Bugfix sync index append After a failover, we sometimes get index files with duplicate records in them and other nastiness. The cyrus index file format has this clever "ignore" junk at the end until the exists count changes trick. sync_commit.c contains a copy and pasted "index append" logic, but it seeks to the end of the file rather than start_offset + exists * record_size. This is bogus and means that at some random future stage a real record will be overwritten and two of the old bogus record will still exist. Crap. This patch fixes that, obviously. Index: cyrus-imapd-2.3.9/imap/sync_commit.c =================================================================== --- cyrus-imapd-2.3.9.orig/imap/sync_commit.c 2007-08-31 10:13:28.000000000 -0400 +++ cyrus-imapd-2.3.9/imap/sync_commit.c 2007-08-31 10:26:03.000000000 -0400 @@ -481,6 +481,7 @@ unsigned long newflagged; char target[MAX_MAILBOX_PATH]; int n, r = 0; + long last_offset; struct txn *tid = NULL; modseq_t highestmodseq = 0; @@ -577,9 +578,11 @@ if (retry_writev(mailbox->cache_fd, cache_iovec, upload_list->count) < 0) goto fail; - lseek(mailbox->index_fd, mailbox->index_size, SEEK_SET); + + last_offset = mailbox->start_offset + mailbox->exists * mailbox->record_size; + lseek(mailbox->index_fd, last_offset, SEEK_SET); if (retry_write(mailbox->index_fd, index_chunk, - upload_list->count * INDEX_RECORD_SIZE) < 0) + upload_list->count * mailbox->record_size) < 0) goto fail;