On 3/27/2012 3:40 PM, Jeff Gustafson wrote:
I looked around the 'Net to see if there might be a custom program for
offline Maildir to mdbox conversion. So far I haven't turned up
anything. The problem for us is that the dsync program simply takes a
lot of time to convert mailboxes.
Is it slower than doing an IMAP APPEND over an authenticated dovecot
connection?
I've used a simple PERL script based on Mail::IMAPClient and Mail::Box
to import 180,000+ mailboxes into dovecot's mdbox at fairly high speed,
and all it does is IMAP APPENDs. (I had to shard the mailboxes because
these PERL based tools exhaust RAM when run with mailboxes larger than
about 600MB).
On my development VM test box (32 bit Slack 13.37, 2G/2G split kernel,
no RAID, Q6600 with only two cores allocated to the VM) and 8GB of DDR2
RAM does
Emails=180,044
real 237m28.485s (12.5 emails/second)
user 94m50.425s
sys 10m09.389s
21,984,824 /mail/home
I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox
"transcoding" tool, since none appear to exist. To keep it simple, I/O
to/from "remote" mailbox (connections) are not pipelined. It won't
require more than MAXEMAILSIZE's worth of RAM (if one of the directions
involves a remote connection), and so far when processing MIX, Maildir,
and Mbox files, it's extremely fast.
Adding support for [sm]dbox wouldn't appear to be problematic. At the
moment, it supports everything Panda's c-client supports plus
Maildir/Maildir++ (including Panda's "MIX").
Write support for Maildir's extremely UNDER-tested so far, as I've
mainly used it to import Maildir hives.
I've experimented with Maildir as a format, and while the one email to a
file model seems like a sensible idea, it seems to simply transfer
stress from one part of the system to another, mainly filesystems, and
not many of those are really up for handling that many files in one
directory very efficiently.
None of my users have mailboxes with fewer than 100K emails in them,
some have more than a million.
=R=