> I don't have something to consume make_md5 data, yet, either.  My
> plan is to note the difference between the replica and the primary.
> On a subsequent run, if those differences aren't gone, then they
> would be included in a report.

Rather than make_md5, check the MD5 UUIDs patch below. Using this, we have a 
script that regularly checks both sides of a master/replica pair to check 
everything is consistent between the UUID and the computed MD5. It was this 
that let us discover the rare "didn't unlink old files" bug reported about 3 
months back.

---
http://cyrus.brong.fastmail.fm/

One problem we've had is the inability to easily check that the files on 
disk correspond to what was originally delivered to check for cyrus data 
corruption after either a disk problem or some other bug has caused us to be 
unsure of our data integrity.
I wanted to calculate a digest and store it somewhere in the index file, but 
messing with the file format and fixing sync to still work, etc... it all 
sounded too painful.

So - added is a new option "uuidmode" in imapd.conf. Set it to md5 and you 
will get UUIDs of the form: 02(first 11 bytes of the MD5 value for the 
message) which takes up the same space, but allows pretty good integrity 
checking.

Is it safe? - we calulated that with one billion messages you have a one in 
1 billion chance of a birthday collision (two random messages with the same 
UUID). They then have to get in the same MAILBOXES collection to sync_client 
to affect each other anyway. The namespace available for generated UUIDs is 
much smaller than this, since they have no collision risk - but if you had 
that many delivering you would hit the limits and start getting blank UUIDs 
anyway.

Mitigating even the above risk: you could alter sync_client to not use UUID 
for copying. It's not like it's been working anyway (see our other UUID 
related patch). As an integrity check it's much more useful.


The attached patch adds the md5 method, a "random" method which I've never 
tested and is almost certainly bogus, but is there for educational 
value[tm], the following FETCH responses in imapd:

FETCH UUID => 24 character hex string (02 + first 11 bytes of MD5) FETCH 
RFC822.MD5 => 32 character hex string (16 bytes of MD5) FETCH 
RFC822.FILESIZE => size of actual file on disk (via stat or mmap)

Totally non-standard of course, but way useful for our replication checking 
scripts. Embrace and extend 'r' us.

Anyone feel like writing an RFC for fetching the digest of a message via 
IMAP? If the server calculated it on delivery and cached it then you'd have 
a great way to clean up after a UIDVALIDITY change or other destabilising 
event without having to fetch every message again.

---

Rob

----
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Reply via email to