On Tue, Sep 14, 2010 at 02:24:17PM -0500, Patrick Goetz wrote: > On 09/14/2010 12:26 PM, Dave McMurtrie wrote: > > > >The Cyrus wiki's content has been mostly moved to > >http://www.cyrusimap.org/ except for what I considered to be useless or > >outdated content. > > > > Hmmm, the take away message is the wiki is rather light on useful, > timely content. <:) > > There's been some discussion on the Debian cyrus list about how to > automate upgrades from cyrus 2.n.k to cyrus 2.m.j. Jeroen van > Meeuwen (on both lists) suggested that the cyrus RPM package > features a utility called cyrus-imapd.cvt_cyrusdb_all which might be > useful for this.
Yeah - we were talking about this the other day on instant messaging, Jeroen and I! I've written something a bit nicer. Basically, I ripped out the "guts" of cvt_cyrusdb and stuck it in lib/cyrusdb.c. Then I wrote a "detect" function that checks that magic and figures out if the file is berkeley, berkeley-hash or skiplist from its magic. Then for each file it checks if the type matches the configuration value, and converts if it doesn't. This is run during ctl_cyrusdb -r during startup. > I've been looking at this script, and it mostly appears to be using > cvt_cyrusdb to convert particular db files to Cyrus skiplists and > then back again to the original db backend format. I can't follow > the script completely as it seems to rely on DB configuration > details found in the imapd.conf file I don't have in my Debian > 2.1.16 imap server, and it's also not clear how the script is run. > > This raises a number of questions, though: > > 1. > Cyrus skiplists? I thought all the DB files were in Berkeley DB > format. I tried to find some documentation on skiplists, but only > found an old message to the developer list from Bron Gondwana > discussing skiplist bugs > (http://markmail.org/message/zbaq765brbg2acfj). Yes, Cyrus Skiplists. It's a DB format written entirely inside Cyrus. They're quite stable now. The only real downside is that the lock is global per database - they don't have any concept of row locking, so concurrency can suffer. This usually isn't a big problem. At FastMail we've had ALL our databases in skiplist for a couple of years now. > On the other hand, this guy talks about converting all Berkeley DB > files to skiplists because of perceived libdb bugs: > http://www.mail-archive.com/[email protected]/msg31953.html I'm currently trying to find someone (either inside Opera or elsewhere) to help me debug Cyrus' use of BDB and see if we can do it better. I suspect the BDB problems are more with how we're using it as with BDB itself. > Skiplists: what are they, when and why use them? Either I'm a bad > googler or documentation seems to be lacking. lib/cyrusdb_skiplist.c - knock yourself out :) They're very good for sequential reads - "foreach" and friends. It's a very lightweight format, which provides pretty good locality of data - so it's fairly cache friendly. > 2. > The Redhat cvt_cyrusdb_all script seems to assume a specific set of > database files. Is the set of cyrus imap DB files fixed, and if so > what are they? Is there any documentation on what each database > file contains? This would be very useful to people trying to convert > older cyrus IMAP installations to new ones. Pretty much, yes. There are a handful of files - plus the per user seen, sub and quota files. Seen are skiplist and sub is flat file. Quota is its own special format. Here's the listing of the main databases: dblist[] = { { FNAME_MBOXLIST,>-->-------&config_mboxlist_db,>---1 }, { FNAME_QUOTADB,>--->-------&config_quota_db,>------1 }, { FNAME_ANNOTATIONS,>-------&config_annotation_db,>-1 }, { FNAME_DELIVERDB,>->-------&config_duplicate_db,>--0 }, { FNAME_TLSSESSIONS,>-------&config_tlscache_db,>---0 }, { FNAME_PTSDB,>----->-------&config_ptscache_db,>---0 }, { FNAME_STATUSCACHEDB,>-----&config_statuscache_db,>0 }, { NULL,>---->------->-------NULL,>-->------->-------0 } }; The only three you really need to care about are mboxlist, quota and annotations - and of those, quota probably doesn't exists if you've got "legacyquota". By legacy I mean, we use it - because it's less lock contention and more reliable. Anyway. Discard the ones with '0' in the archive value, because they're just caches and the format has probably changed anwyay - but upgrade your mboxlist and annotation files. Skiplist hasn't changed format in approximately forever. I have considered upgrading it (mainly to add some more internal integrity checks), but the benefits haven't outweighed the costs yet. I did write a skiplist-2 file format at one point and start playing with it, but that was years ago. > 3. > The dicussion of DB backends leads one to wonder if this means > Berkeley DB or skiplists, or if other backends are used, too? Is > there any documentation on this? There's flat - and Ken added some SQL support (sqlite, mysql and postgresql) a little while back, though I haven't tested it yet. No, there's not much documentation. I'm working on fixing that too. I wrote up an outline of what I want to document on the old wiki - not sure if it's been ported across, but I have a copy in my email as well. I'll paste it below. Bron. ==================================================== Here's an overview of what needs to be documented. ---++ On Disk Format * mailbox * cyrus.header * cyrus.index * cyrus.cache * cyrus.squat (stub for now) * message files (rfc822) * file naming * dir hashing algorithms * config variables (including partitions) * domain split * db subformats * quota * seen * sub * mboxlist * deliver * annotations * statuscache * sieve * sync log files * proc files * "special" - shutdown, etc. * db formats: skiplist, flat, berkeley, quotalegacy ---++ Locking * name locks * cyrus.index locks * deadlock prevention ---++ Index API * how it works * how the "client view" is kept in sync ---++ Replication * wire format (dlist) * full protocol overview * locking considerations * sync_crc - calculation and purpose * split brain recovery ---++ Reconstruct * how it works now * flags and purpose (also, man page) ---++ mbdump * still needs to be rewritten to use dlist! * incremental dumps ---++ Internal APIs * seqset_ * buf_ * charset_ * prot_ There's lots of stuff that needs to be either documented or updated to make Cyrus development viable for people who aren't Bron right now. Lots has changed!
