On Saturday, 7 April 2012 at 11:41:41 UTC, Rainer Schuetze wrote:
 >
Maybe it is the trim command being executed on the sectors previously occupied by the file.


No, perhaps I didn't make it clear that the rmdir slowness is only an issue on hard drives. I can unzip the 2GB archive in about 17.5 sec on the ssd drive, and delete it using the rmd multi-thread delete example program in less than 17 secs on the ssd drive. The same operations on a hard drive take around 60 seconds to extract, but 1.5 to 3 minutes to delete.

H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17405 ms

H:\>rmd tz
removing: .\tz
finished! time:16671 ms


I've been doing some reading on the web and studying the procmon logs. I am convinced the slow hard drive delete is an issue with seek times, since it is not an issue on the ssd. It may be caused by fragmentation of the stored data or the mft itself, or else it could be that ntfs is doing some book-keeping journaling. You are right that it could be doing delete notifications to any application watching the disk activity. I've already turned off the virus checker and the indexing, but I'm going to try the tweaks in the second link and also try the mydefrag program in the third link and see if anything improves the hd delete times.


http://ixbtlabs.com/articles/ntfs/index3.html
http://www.gilsmethod.com/speed-up-vista-with-these-simple-ntfs-tweaks
http://www.mydefrag.com/index.html


That mydefrag has some interesting ideas about sorting folders by full pathname on the disk as one of the defrag algorithms. Perhaps using it, and also using unzip and zip algorithms that match the defrag algorithm, would be a nice combination. In other words, if the zip algorithm processes the files in a sorted-by-pathname order, and if the defrag algorithm has created folders that are sorted on disk by the same order, then you would expect optimally short seeks while processing the files in the order they are stored.

The mydefrag program uses the ntfs defrag api. There is an article at the following link showing how to access it to get the Logical Cluster Numbers on disk for a file. I suppose you could sort your file operations by start LCN, of the file, for example during compression, and that might reduce the seek related delays.

http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx



Reply via email to