As of a few minutes ago, imap-2006b is now in release status, and is the version obtained by
        ftp://ftp.cac.washington.edu/mail/imap.tar.Z

The performance problem with traditional UNIX mailbox format that Rex and Antonio reported is fixed in this version. The problem turned out to be heap thrashing.

I did some benchmarks. The following must be taken with a grain, maybe a rock, of salt; "there are lies, damned lies, and benchmarks."

For testing, I created a 5000+ message mailbox, populated with the archives of a mailing list. I tested the following:
 . ("select") select (open mailbox) only
 . ("fetch") select + fetch full (which includes header and MIME parsing)
    + fetch raw message text
 . ("thread") select + thread

All tests were repeated to control for buffer cache.

Traditional UNIX format produces comparable results between imap-2004g and (the final) imap-2006b. The differences, such as they were, were not significant in my opinion (1/10 second or less); nor did they point to either imap-2004g or imap-2006b as being "better".

I also tested the mbx format in imap-2004g and imap-2006b, and mix in imap-2006b.

In all three tests, mbx performance was comparable between imap-2004g and imap-2006b; and once again the differences were not significant nor pointed to either version as being "better".

In comparing the performace of traditional UNIX, mbx, and mix in imap-2006b, I determined the following:

With mix as a baseline:
 . select:
   . traditional UNIX is 40% slower
   . mbx is 14% slower
 . fetch:
   . traditional UNIX is 14% slower
   . mbx is comparable
 . thread:
   . traditional UNIX is 3 times slower
   . mbx is 2.9 times slower

None of these times surprise me, and here are my comments:

The ratio between select times in traditional UNIX and mbx format depend very much upon the size of the messages. It is possible to create a pathological mailbox using tiny messages in which traditional UNIX will do better than mbx (I know how to fix this, and I may someday); but in general mbx will be faster than traditional UNIX for normal-sized messages. The ratio of select time between traditional UNIX and mbx format should be constant for any given average message size; the larger the average message size, the better that mbx does.

Mix, on the other hand, is indexed and does not have an appreciably longer select time as the number of messages increases. The time will increase, but at a much slower rate. Although the ratio with the other formats seems like it should be constant, it actually isn't because the overhead in processing the index swamps the per-message cost.

Fetch is completely non-surprising. Traditional UNIX format has to do newline conversion (UNIX-style LF-only to Internet-style CRLF) whereas mbx and mix do not have to do so. The work in fetching is otherwise comparable between mbx and mix.

The thread test differences between traditional UNIX and mbx is due to the select time difference, since otherwise the work is comparable. Mix has a sortcache which is the key to its greater speed in threading.

For small mailboxes (2 digit message counts), there is really going to be no substantial difference between traditional UNIX, mbx, and mix. The differences start to show with 3 digit message counts, and become more pronounced with 4 digit message counts.

Mix really shines at 5 digit and greater message counts, when traditional UNIX and mbx become unusable.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
https://mailman1.u.washington.edu/mailman/listinfo/imap-uw

Reply via email to