We've had issues with broken indexes on the old master after a failover,
and puzzled for ages over the cause.  This is the reason behind the
SIGQUIT patch which would mitigate this by reducing the "unfinished
business" on a standard shutdown.

But still, the symptoms were weird.

I'm pretty sure I've figured out the cause, and the attached patch
(compile tested only) fixes it...  unfortunately it's a race condition
so it's hard to test.

Ken - it would be great if you can check this today and push it to CVS
if you agree with me.  It's a nasty one - especially since the symptoms
are offset index records right until something that's not a sync_server
append comes along and writes to the correct place in the index file
again.

On a more structural note - it would be great to re-factor the lower
level APIs (mailbox.c, mboxlist.c, etc) to provide everything that
sync_server needs manipulate their data structures and remove all the
low level fiddling copy-and-pasted stuff from the sync layer.  This
comes back a bit to David's comments today about the difficulty of
having patches flying around - I suspect he copied and pasted so he
wasn't throwing tentacles deep through the main codebase - but the
end result is more brittle software and poor abstraction.  We need
to avoid doing that if there's a bunch of us all hacking in our own
little patch space!

Bron.
Bugfix sync index append

After a failover, we sometimes get index files
with duplicate records in them and other nastiness.

The cyrus index file format has this clever "ignore"
junk at the end until the exists count changes trick.

sync_commit.c contains a copy and pasted "index append" 
logic, but it seeks to the end of the file rather than
start_offset + exists * record_size.  This is bogus and
means that at some random future stage a real record will
be overwritten and two of the old bogus record will still
exist.  Crap.

This patch fixes that, obviously.
Index: cyrus-imapd-2.3.9/imap/sync_commit.c
===================================================================
--- cyrus-imapd-2.3.9.orig/imap/sync_commit.c	2007-08-31 10:13:28.000000000 -0400
+++ cyrus-imapd-2.3.9/imap/sync_commit.c	2007-08-31 10:26:03.000000000 -0400
@@ -481,6 +481,7 @@
     unsigned long newflagged;
     char  target[MAX_MAILBOX_PATH];
     int   n, r = 0;
+    long   last_offset;
     struct txn *tid = NULL;
     modseq_t highestmodseq = 0;
 
@@ -577,9 +578,11 @@
     if (retry_writev(mailbox->cache_fd, cache_iovec, upload_list->count) < 0)
         goto fail;
 
-    lseek(mailbox->index_fd, mailbox->index_size, SEEK_SET);
+    
+    last_offset = mailbox->start_offset + mailbox->exists * mailbox->record_size;
+    lseek(mailbox->index_fd, last_offset, SEEK_SET);
     if (retry_write(mailbox->index_fd, index_chunk,
-                    upload_list->count * INDEX_RECORD_SIZE) < 0)
+                    upload_list->count * mailbox->record_size) < 0)
         goto fail;
 
 

Reply via email to