On Tue, Dec 18, 2007 at 01:00:41PM +0000, David Carter wrote: > Take a mailbox with 2 messages. > > Expunge the second message and then the first. "reconstruct -k" will > generate a mailbox where the second message appears in both cyrus.index and > cyrus.expunge. Not good, especially when cyrus_expire -X cuts in.
Wow, talk about your synchronicity. They say that when you're having a problem you should sleep on it and see if you have new ideas in the morning. Someone in another timezone independantly finding the issue and providing a fix is a significant bonus on top of that! Thanks for this. I was just about to start tracking down the issue. We have known this was occuring for a while, but not tracked it down. I wrote a script (check_cyrus_indexes) which was rescuing the deleted message files from the replica or even pulling them from backup daily. All good until I added an option to remove one of the index records when there were duplicates in both index and expunge. Stupidly I chose to strip the one in the .expunge file! Meanwhile, Rob added an option to our replication checker to check for failure to correctly replicate quota usage. We have a few users with that problem (underlying cause as yet unproven, but I can see how it could get out of sync since quota usage is deltas and a reconstruct plus quota -f at only one end could cause fun 'n' games) This caused a bunch of reconstructs to be run because that's one of the fixes the replication script tries before telling us of the error. Doh! - suddenly users had messages re-appearing. Unsurprisingly this made them grumpy! > A slightly more ambitious fix, using the fact that both uid[] and expuid[] > have been sorted into order of ascending UID: > > http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/patches/2.3cvs/reconstruct2.patch > > I believe that the second patch matches the intent of the original code. Absorbed into my patch cloud :) I'm very happy to not have to redo all the work of figuring this out. Go David! Again, many thanks, Bron.