> I don't have something to consume make_md5 data, yet, either. My > plan is to note the difference between the replica and the primary. > On a subsequent run, if those differences aren't gone, then they > would be included in a report.
Rather than make_md5, check the MD5 UUIDs patch below. Using this, we have a script that regularly checks both sides of a master/replica pair to check everything is consistent between the UUID and the computed MD5. It was this that let us discover the rare "didn't unlink old files" bug reported about 3 months back. --- http://cyrus.brong.fastmail.fm/ One problem we've had is the inability to easily check that the files on disk correspond to what was originally delivered to check for cyrus data corruption after either a disk problem or some other bug has caused us to be unsure of our data integrity. I wanted to calculate a digest and store it somewhere in the index file, but messing with the file format and fixing sync to still work, etc... it all sounded too painful. So - added is a new option "uuidmode" in imapd.conf. Set it to md5 and you will get UUIDs of the form: 02(first 11 bytes of the MD5 value for the message) which takes up the same space, but allows pretty good integrity checking. Is it safe? - we calulated that with one billion messages you have a one in 1 billion chance of a birthday collision (two random messages with the same UUID). They then have to get in the same MAILBOXES collection to sync_client to affect each other anyway. The namespace available for generated UUIDs is much smaller than this, since they have no collision risk - but if you had that many delivering you would hit the limits and start getting blank UUIDs anyway. Mitigating even the above risk: you could alter sync_client to not use UUID for copying. It's not like it's been working anyway (see our other UUID related patch). As an integrity check it's much more useful. The attached patch adds the md5 method, a "random" method which I've never tested and is almost certainly bogus, but is there for educational value[tm], the following FETCH responses in imapd: FETCH UUID => 24 character hex string (02 + first 11 bytes of MD5) FETCH RFC822.MD5 => 32 character hex string (16 bytes of MD5) FETCH RFC822.FILESIZE => size of actual file on disk (via stat or mmap) Totally non-standard of course, but way useful for our replication checking scripts. Embrace and extend 'r' us. Anyone feel like writing an RFC for fetching the digest of a message via IMAP? If the server calculated it on delivery and cached it then you'd have a great way to clean up after a UIDVALIDITY change or other destabilising event without having to fetch every message again. --- Rob ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html