As of a few minutes ago, imap-2006b is now in release status, and is the
version obtained by
ftp://ftp.cac.washington.edu/mail/imap.tar.Z
The performance problem with traditional UNIX mailbox format that Rex and
Antonio reported is fixed in this version. The problem turned out to be
heap thrashing.
I did some benchmarks. The following must be taken with a grain, maybe a
rock, of salt; "there are lies, damned lies, and benchmarks."
For testing, I created a 5000+ message mailbox, populated with the
archives of a mailing list. I tested the following:
. ("select") select (open mailbox) only
. ("fetch") select + fetch full (which includes header and MIME parsing)
+ fetch raw message text
. ("thread") select + thread
All tests were repeated to control for buffer cache.
Traditional UNIX format produces comparable results between imap-2004g and
(the final) imap-2006b. The differences, such as they were, were not
significant in my opinion (1/10 second or less); nor did they point to
either imap-2004g or imap-2006b as being "better".
I also tested the mbx format in imap-2004g and imap-2006b, and mix in
imap-2006b.
In all three tests, mbx performance was comparable between imap-2004g and
imap-2006b; and once again the differences were not significant nor
pointed to either version as being "better".
In comparing the performace of traditional UNIX, mbx, and mix in
imap-2006b, I determined the following:
With mix as a baseline:
. select:
. traditional UNIX is 40% slower
. mbx is 14% slower
. fetch:
. traditional UNIX is 14% slower
. mbx is comparable
. thread:
. traditional UNIX is 3 times slower
. mbx is 2.9 times slower
None of these times surprise me, and here are my comments:
The ratio between select times in traditional UNIX and mbx format depend
very much upon the size of the messages. It is possible to create a
pathological mailbox using tiny messages in which traditional UNIX will do
better than mbx (I know how to fix this, and I may someday); but in
general mbx will be faster than traditional UNIX for normal-sized
messages. The ratio of select time between traditional UNIX and mbx
format should be constant for any given average message size; the larger
the average message size, the better that mbx does.
Mix, on the other hand, is indexed and does not have an appreciably longer
select time as the number of messages increases. The time will increase,
but at a much slower rate. Although the ratio with the other formats
seems like it should be constant, it actually isn't because the overhead
in processing the index swamps the per-message cost.
Fetch is completely non-surprising. Traditional UNIX format has to do
newline conversion (UNIX-style LF-only to Internet-style CRLF) whereas mbx
and mix do not have to do so. The work in fetching is otherwise
comparable between mbx and mix.
The thread test differences between traditional UNIX and mbx is due to the
select time difference, since otherwise the work is comparable. Mix has a
sortcache which is the key to its greater speed in threading.
For small mailboxes (2 digit message counts), there is really going to be
no substantial difference between traditional UNIX, mbx, and mix. The
differences start to show with 3 digit message counts, and become more
pronounced with 4 digit message counts.
Mix really shines at 5 digit and greater message counts, when traditional
UNIX and mbx become unusable.
-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
https://mailman1.u.washington.edu/mailman/listinfo/imap-uw