Quoting Vincent Lefevre (vinc...@vinc17.net): > [...] > > One can see an obvious difference: grep and my script both read the > files in the directory order (I know that this is the case with my > script, and grep's behavior is identical), which can be regarded as > random due to the use of a hash (see the other thread). Mutt uses a > different order, and after a look at its mh.c source file, I can see > that it sorts the files by inode number (see maildir_delayed_parsing > function). IMHO, this is a good choice because, specially in big > directories, doing that may lead to contiguous files on the disk, > and I think that it is the reason why Mutt is much faster.
Well that's a relief. I was getting worried about there being some "magic" involved when you said you didn't use cacheing. So, looking back at https://lists.debian.org/debian-user/2015/04/msg01265.html , "In which case, if you want to know how come mutt is so fast, take a look at the source. Just to mention one optimisation I would consider: slurp the directory and sort the entries by inode. Open the files in inode order. And another: it's probably faster to slurp bigger chunks of each file (with an intelligent guess of the best buffer size) and use a fast search for \nMessage-ID rather than reading and checking line by line." perhaps my second suggestion would also contribute to a speed-up. Here, it does come down to "black magic": I can't understand the methods they use to string-search so quickly in regular expressions etc. (Note: obviously these suggestions were not original.) > Now I wonder whether the use of the hash by ext3 is a good idea... I don't see why. Directory-hashing only slows down the process of obtaining the inode numbers from the directory. With a simple linear directory, you might get that list of inode numbers more quickly, and it might even be closer to being sorted. But that's all fairly localised on the disk, and sorting is quick. The major speed-up that you've demonstrated is made by accessing the file contents from a sorted list of inode numbers (correlating with the position of the files on the disk). So in the absence of sorting (ie with general purpose tools like grep), doing away with hashing will speed up the special case of reading all the files in (a) just one entire directory (b) which hasn't had the entries jumbled by insertion/deletion/renaming of files and (c) which is specified using the directory's name (like grep -r <directory-name>). And should you read the whole directory by specifying <directory-name>/*, you lose the benefit and thrash the disk again. I have useful little bash functions that return the alphabetically first or last, or the most recently modified file among the filenames supplied. Perhaps I'll write one to take a list of filenames and return them all, but sorted into inode order. (Maybe it already exists.) Cheers, David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150424213951.GA13410@alum