I'm working on a parallel unzip. I started with phobos std.zip,
but found that to be too monolithic. I needed to separate out
the tasks that get the directory entries, create the directory
tree, get the compressed data, expand the data and create the
uncompressed files on disk. It currently unzips a 2GB directory
struct in about 18 secs while 7zip takes around 55 secs. Only
about 4 seconds of this is the creation of the directory
structure and the expanding. The other 14 secs is writing the
regular files.
The subtasks needed to be separated not only because of the need
to run them in parallel, but also because the current std.zip
implementation is a memory hog, keeping the whole compressed and
expanded data sections in memory. I was running out of memory in
a 32 bit application just attempting to unzip the test file with
the std.zip operations. The parallel version peaks at around
150MB memory used during the operation.
The parallel version is still missing the operation of restoring
the original file attributes, and I see no example in the
documents of what would normally be done. Am I missing this
somewhere? I'll have to dig around...
- parallel unzip in progress Jay Norwood
-