On 2015-04-15 13:41:20 -0600, Bob Proulx wrote:
> Vincent Lefevre wrote:
> > I also notice slowness with a large maildir directory:
> > 
> > drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/
> > 
> > In this one, the files are real (145400 files), but I have a Perl
> > script that basically reads the headers and it takes a lot of time
> > (several dozens of minutes) after a reboot or dropping the caches
> > as you suggested above. With a second run of this script, it just
> > takes 8 seconds.
> 
> This is going to be at least two different points of slowness.  One is
> the directory that must be read.  Two is simply opening 145400 files
> and reading the mail header from each of them is going to take a
> while.  Opening many files will have a quantifiable time.  Try this
> experiment.  Cache the directory and the inodes without opening the
> file.  Then run your perl script to read the mail headers.  That
> should 
> 
>     # echo 3 > /proc/sys/vm/drop_caches
>     # ls -lR Mail/oldarc/cur >/dev/null
> 
> Then run your perl script:
> 
>   $ time yourperlscript Mail/oldarc/cur
>   $ time yourperlscript Mail/oldarc/cur

I can confirm that the first instance takes a lot of time (probably as
much as without the "ls -lR Mail/oldarc/cur" given the fact that this
command runs in a few seconds); I aborted it after 10 minutes.

[...]
> It would also be interesting to convert the Maildir with 145400 files
> to a compressed mbox format single file.  (That will convert "^From "
> lines if that is a concern for you.)  I expect that if you were to
> modify your perl script program to read the compressed mbox file and
> do the same task that it might be faster!  It would remove the
> overhead time needed to open each of those 145400 files.

Possibly, but individual modifications would take much more time than
with Maildir (such modifications, consisting in retagging, occur from
time to time).

> It all depends upon the distribution of data size of the body of the
> messages since then it would need to read and skip the message
> bodies.

With an uncompressed mbox file, using the Content-Length, it could be
faster, but there's still the problem with individual changes.

> But let's say that all of the bodies were small less than 50k then I
> expect that converging them to a single mbox file would make them much
> faster than the individual files.

6.5 KB in average.

> Also compressing the file reduces the amount of I/O needed to pull
> the data into memory. With today's fast cpus decompression is faster
> than disk I/O and reading a compressed file and decompressing it is
> usually faster in my experience. Every case is individually
> different however. If you run that experiment I would be interested
> in knowning the result.

But recompressing would be very slow.

I wonder whether there exists some specific FS that would make maildir
access very fast and whether using it on a disk image that could be
loop-mounted would be interesting.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150420154421.ga17...@ypig.lip.ens-lyon.fr

Reply via email to