Re: [Dovecot] Need fast Maildir to mdbox conversion

2012-03-28 Thread Jeff Gustafson
On Wed, 2012-03-28 at 09:24 +0200, Jan-Frode Myklebust wrote:

 Why is it a problem that dsync takes a long time, when it can be done
 without downtime for the users?
 
 I just started our maildir-mdbox convertion yesterday, using the
 attached script. I only converted a little over 1 easy accounts
 (accounts with simple folder names, as I expect to run into problems
 once we start hitting accounts with trailing dot or broken latin1/utf8
 characters in  the folder names). I might agree it wasn't quick, but
 that really doesn't matter as the only downtime for the user is that
 he's potentially kicked out during the userdb update.

I looked over your script. I plan on doing some trial runs with it. I
think the trick where you re-run the sync and then boot the user off the
connection should work pretty well. I hadn't totally fleshed out the
scripting on the conversion since there is a lot more I need to do with
the database and configuration files first. It appears I can use your
script as a starting point for our configuration.

...Jeff

 
 
   -jf
 
 We're hoping that converting away from Maildir will help us speed up
  the backup processes by reducing the number of files to process.
 




Re: [Dovecot] Need fast Maildir to mdbox conversion

2012-03-28 Thread Jeff Gustafson

On Tue, 2012-03-27 at 20:00 -0700, Robin wrote:
 I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox 
 transcoding tool, since none appear to exist.  To keep it simple, I/O 
 to/from remote mailbox (connections) are not pipelined.  It won't 
 require more than MAXEMAILSIZE's worth of RAM (if one of the directions 
 involves a remote connection), and so far when processing MIX, Maildir, 
 and Mbox files, it's extremely fast.

This sounds interesting. If it could so [sm]dbox, it would be very,
very useful to large installations.

...Jeff




[Dovecot] Need fast Maildir to mdbox conversion

2012-03-27 Thread Jeff Gustafson
I looked around the 'Net to see if there might be a custom program for
offline Maildir to mdbox conversion. So far I haven't turned up
anything. The problem for us is that the dsync program simply takes a
lot of time to convert mailboxes. I wonder if time could be saved with a
program that is optimized to convert mailboxes without the fancy locking
that dsync needs to do. Does have (or seen) a tool that could do this?
We're hoping that converting away from Maildir will help us speed up
the backup processes by reducing the number of files to process.

...Jeff



Re: [Dovecot] Need fast Maildir to mdbox conversion

2012-03-27 Thread Robin

On 3/27/2012 3:40 PM, Jeff Gustafson wrote:

I looked around the 'Net to see if there might be a custom program for
offline Maildir to mdbox conversion. So far I haven't turned up
anything. The problem for us is that the dsync program simply takes a
lot of time to convert mailboxes.


Is it slower than doing an IMAP APPEND over an authenticated dovecot 
connection?


I've used a simple PERL script based on Mail::IMAPClient and Mail::Box 
to import 180,000+ mailboxes into dovecot's mdbox at fairly high speed, 
and all it does is IMAP APPENDs.  (I had to shard the mailboxes because 
these PERL based tools exhaust RAM when run with mailboxes larger than 
about 600MB).


On my development VM test box (32 bit Slack 13.37, 2G/2G split kernel, 
no RAID, Q6600 with only two cores allocated to the VM) and 8GB of DDR2 
RAM does


Emails=180,044
real237m28.485s  (12.5 emails/second)
user94m50.425s
sys 10m09.389s
21,984,824  /mail/home

I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox 
transcoding tool, since none appear to exist.  To keep it simple, I/O 
to/from remote mailbox (connections) are not pipelined.  It won't 
require more than MAXEMAILSIZE's worth of RAM (if one of the directions 
involves a remote connection), and so far when processing MIX, Maildir, 
and Mbox files, it's extremely fast.


Adding support for [sm]dbox wouldn't appear to be problematic.  At the 
moment, it supports everything Panda's c-client supports plus 
Maildir/Maildir++ (including Panda's MIX).


Write support for Maildir's extremely UNDER-tested so far, as I've 
mainly used it to import Maildir hives.


I've experimented with Maildir as a format, and while the one email to a 
file model seems like a sensible idea, it seems to simply transfer 
stress from one part of the system to another, mainly filesystems, and 
not many of those are really up for handling that many files in one 
directory very efficiently.


None of my users have mailboxes with fewer than 100K emails in them, 
some have more than a million.


=R=