Re: IMAP sync tool (rsync for IMAP)

2006-12-24 Thread Simon Matter
 On Thu, Dec 21, 2006 at 03:47:26PM -0800, Florin Andrei wrote:
 Essentially, I need a tool that I can point at servers A and B and tell
 it get all the email from my account on server A to a specific folder
 on my account on server B, preserving the subfolders hierarchy.
 The tool needs to be smart enough to repeat the operation later on but
 then it must only transfer the new messages.
 The tool may run on one or the other IMAP servers, or even on a 3rd
 machine, since it should be network-based. Pretty much all systems are
 Linux 'round here, some Windows stragglers too.

 Sort of like rsync for IMAP, if that makes sense.

 So far, the only tool I've found is imapsync:

 http://freshmeat.net/projects/imapsync/

 Anyone tried it with Cyrus? Good/bad experiences?

 Are there any other tools that work better with Cyrus?

 Another thing you might want to consider is offlineimap:

 http://software.complete.org/offlineimap/

 It's a pile of multithreaded python, but don't let that put you
 off!  I've found it quite robust for IMAP - Maildir usage (Indeed,
 I'm replying to this message via Mutt using local Maildirs which
 are kept synchronised with my FastMail (Cyrus 2.3.7+patches) IMAP
 account).

 Imap = Imap usage I found less robust in that it sometimes got
 confused when it had been killed in the middle of operations.  If
 you didn't keep killing it all the time (my usage patterns were
 pretty strange) it was better - and also I think if you had a server
 which supports UID PLUS like Cyrus does then it would be safer.
 Give it a look though.

 apt-get install offlineimap works on pretty much any sane linux
 these days :)  (I believe you can even get it with apt4rpm, though
 that's not my particular kink)

For RedHat/Fedora (not blessed by apt/deb by default), I'm maintaining
offlineimap src rpms here http://www.invoca.ch/pub/packages/offlineimap/
which can be built using
rpmbuild --rebuild offlineimap-4.x.xx-x.src.rpm

The same is true for imapsync here
http://www.invoca.ch/pub/packages/imapsync/

Simon

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


More timings populating a mailbox

2006-12-24 Thread Ross Boylan
Here are more results, playing with various filesystems and options
delivering messages to Cyrus 2.2 via UW-IMAP mailutil.  This uses
IMAP, not LMTP, to insert the messages into the mailbox.
Linux 2.6.18 Kernel, SATA 7200 RPM drive.  This is not server-class
hardware, though it's not bad.

For the impatient, best performance came from parallelizing inserts
and using ext3 with noatime and a library that disabled fsync.  One
would not want to disable fsync outside of a migration scenario.

BASIC DATA

This is the same setup as in my earlier message, but this time with
more varieties.  Performance measure is messages/second, delivering
540 messages (average of 3 trials).  I report results for a single
insert and 20 threads inserting simultaneously (the 20 threads read
from the same mbox file but inserted into different IMAP boxes, all
for the same user).  The single insert is in the column headed 1;
the parallel insert is in the column headed 20.  First the results
for the vanilla cases: 

  1   20
24.7  ext3  185.1
18.3  reiser3.6 171.0 (though also 160 and 142 in other scenarios)
25.4  xfs   223.3
25.4  jfs   114.1

ext3 was the only one that clearly responded to mount options
25.2 ext3  267.3  noatime

the fakesync library that disabled fsync did this
26.5 ext3  393.2  noatime + fakesync
 ext3  300.0  noatime,data=journal + fakesync  
 reiser232.1  noatime + fakesync
noatime alone didn't help reiser, consistent with the mount man page's
indication that it might not.

I tried XFS noatime,osyncisdsync.  The first time I got the same
results as before.  I changed the options with mount -o remount.
Suspecting this hadn't caused the options to take effect, I manually
umount'd and then mount'd.  Immediately my tests began failing with
I/O errors; there was a single test mailbox created, and I could not
delete it through cyradm.  Nothing I did produced recovery.  Whether
this indicates an error on my part, danger in using osyncisdsync, or
some combination I don't know.

Changing ext3 directory indexing had little effect on performance
(the test created few directories, though c. 2,000 files in each of
those directories).

TEST COMMENTARY (i.e., stuff not in the tables)

The tests were only semi-controlled.  They were on a test system that
wasn't doing much else.  Aside from the fact that housekeeping jobs
could interefere and the fact that I did the tests manually, I didn't
always reset things to a clean slate.  For example, even when I
deleted the test mailboxes, other stuff may have been building up
(e.g., in cyrus's internal files, in the filesystem tree).

Several different filesystems seemed to exhibit deteriorating
performance as they filled, though this wasn't totally repeatable.

The advantages of fakesync also seemed to decline with use, or perhaps
with repeated writes in the same short period.

I recall seeing advice to use data journaling to improve performance
(I think the logic was that if all the writes went to one spot in the
disk--the journal--they would go faster).  It didn't help here.
Possibly fakesync delivers any benefit data journalling would produce;
possibly I've misunderstood how to apply it; possibly it doesn't help.

RESULTS COMMENTARY (i.e., what does it mean?)

The relatively slow performance of a single thread suggests that
delays in the TCP dialogues may be a signficant factor.  I was not
ambitious enough to implement the Linux socket options that might tune
this, as suggested in earlier threads.  The disk options did not
significantly affect single-thread performance, suggesting the disk is
not the bottleneck in that case.  Note prior reports with FreeBSD that
tuning TCP parameters there produced big performance gains.

Given my ignorance of tuning cyrus, filesystems, and TCP, you should
only take these results as a straw in the wind.

The best/worst performance ratio is pretty large: 393/19 = 20:1.
Parallelizing was the single biggest winner; while the speed increase
is not linear in the number of threads, it's quite possible  20
threads would produce more throughput.  Here are some timings from
Reiser (in a slightly different setting than for the numbers above):
Threads  Mess/sec
 118.6
 232.3
 351.2
 775.8
20   160.5

Merry Christmas, if you celebrate it!

More info on the fakesync library appears below.

Ross Boylan

On Mon, Sep 25, 2006 at 09:45:57AM -0700, Wil Cooley wrote:
 On Sun, 2006-09-24 at 23:17 -0700, Ross Boylan wrote:
 
  First, is this performance to be expected, or might there be something
  here I can improve?  I have quite a bit of mail I'd like to migrate,
  so if there's an easy way to speed this up I'd like to do so.
  
  Second, where should I look to diagnose or solve this problem?
 
 Depending on the number of messages, it could be the constant fsync()
 that slows it down.  Try my fakesync library with LD_PRELOAD and perform
 a test migration; I'd like to know if it makes