Re: unzip parallel, 3x faster than 7zip

Jay Norwood Sat, 07 Apr 2012 10:09:34 -0700

On Saturday, 7 April 2012 at 11:41:41 UTC, Rainer Schuetze wrote:
 >

Maybe it is the trim command being executed on the sectorspreviously occupied by the file.

No, perhaps I didn't make it clear that the rmdir slowness isonly an issue on hard drives. I can unzip the 2GB archive inabout 17.5 sec on the ssd drive, and delete it using the rmdmulti-thread delete example program in less than 17 secs on thessd drive. The same operations on a hard drive take around 60seconds to extract, but 1.5 to 3 minutes to delete.


H:\>uzp tzip.zip tz
unzipping: .\tzip.zip
finished! time: 17405 ms

H:\>rmd tz
removing: .\tz
finished! time:16671 ms

I've been doing some reading on the web and studying the procmonlogs. I am convinced the slow hard drive delete is an issue withseek times, since it is not an issue on the ssd. It may becaused by fragmentation of the stored data or the mft itself, orelse it could be that ntfs is doing some book-keeping journaling.You are right that it could be doing delete notifications to anyapplication watching the disk activity. I've already turned offthe virus checker and the indexing, but I'm going to try thetweaks in the second link and also try the mydefrag program inthe third link and see if anything improves the hd delete times.



http://ixbtlabs.com/articles/ntfs/index3.html
http://www.gilsmethod.com/speed-up-vista-with-these-simple-ntfs-tweaks
http://www.mydefrag.com/index.html

That mydefrag has some interesting ideas about sorting folders byfull pathname on the disk as one of the defrag algorithms.Perhaps using it, and also using unzip and zip algorithms thatmatch the defrag algorithm, would be a nice combination. Inother words, if the zip algorithm processes the files in asorted-by-pathname order, and if the defrag algorithm has createdfolders that are sorted on disk by the same order, then you wouldexpect optimally short seeks while processing the files in theorder they are stored.

The mydefrag program uses the ntfs defrag api. There is anarticle at the following link showing how to access it to get theLogical Cluster Numbers on disk for a file. I suppose you couldsort your file operations by start LCN, of the file, for exampleduring compression, and that might reduce the seek related delays.


http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx

Re: unzip parallel, 3x faster than 7zip

Reply via email to