Vincent Lefevre wrote: > Filesystem created: Mon Jan 4 16:26:16 2010 > > My machine is old, but I've never changed anything concerning the > file system.
2010 isn't that old. Just a baby! :-) After a quick look on my network I located these. desolation: Filesystem created: Tue Feb 26 13:46:27 2008 despair: Filesystem created: Thu Oct 11 17:58:10 2007 devastation: Filesystem created: Tue Feb 26 15:31:37 2008 thrill: Filesystem created: Sun Jun 3 14:50:55 2007 I have been rolling over systems or I am sure I would have located older ones. If I turned on some archived systems I could almost certainly produce older ones. > I also notice slowness with a large maildir directory: > > drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/ > > In this one, the files are real (145400 files), but I have a Perl > script that basically reads the headers and it takes a lot of time > (several dozens of minutes) after a reboot or dropping the caches > as you suggested above. With a second run of this script, it just > takes 8 seconds. This is going to be at least two different points of slowness. One is the directory that must be read. Two is simply opening 145400 files and reading the mail header from each of them is going to take a while. Opening many files will have a quantifiable time. Try this experiment. Cache the directory and the inodes without opening the file. Then run your perl script to read the mail headers. That should # echo 3 > /proc/sys/vm/drop_caches # ls -lR Mail/oldarc/cur >/dev/null Then run your perl script: $ time yourperlscript Mail/oldarc/cur $ time yourperlscript Mail/oldarc/cur Divide 145400 files by the time in seconds to run the first uncached run and you should be able to quantify the files per second performance to open and read the mail headers from those files uncached. Then repeat and determine the cached performance time. That is a lot of files! I expect it would take a while. But the second run cached should be much better. As long as you have enough file system buffer cache to hold those blocks in memory. It would also be interesting to convert the Maildir with 145400 files to a compressed mbox format single file. (That will convert "^From " lines if that is a concern for you.) I expect that if you were to modify your perl script program to read the compressed mbox file and do the same task that it might be faster! It would remove the overhead time needed to open each of those 145400 files. It all depends upon the distribution of data size of the body of the messages since then it would need to read and skip the message bodies. But let's say that all of the bodies were small less than 50k then I expect that converging them to a single mbox file would make them much faster than the individual files. Also compressing the file reduces the amount of I/O needed to pull the data into memory. With today's fast cpus decompression is faster than disk I/O and reading a compressed file and decompressing it is usually faster in my experience. Every case is individually different however. If you run that experiment I would be interested in knowning the result. Bob
signature.asc
Description: Digital signature