Great stuff !

> getBytesWritten vs getTotalBytesWritten - svn revision 1649322
>
> Maybe we should rename getBytesWritten to something like
> getBytesWrittenForLastEntry to make the difference more obvious?

I had  hard time keeping those "written" counters correct - which you
found out :) I renamed to
getBytesWrittenForLastEntry in  r1649374.

Functionally speaking, all the maven testcases now pass with the
parallel zip algorithm.

I have been studying the performance of the gather phase for the last
days and I have a few interesting finds. Currently it manages to
gather right below 200 megabytes/s on my SSD MBP. Using various small
tweaks I seem to be able to at least double that.

Most surprising to me is that it seems like the overhead of lots of
small calls to RandomAccessFile.write seems to be a lot costlier than
I thought it would be. It seems like consolidating to a larger byte
array before calling write is a *lot* faster. So in some places where
the upper memory constraint is known (like writing the central
directory), it seems to make a lot of sense to do it in a single/a few
writes.

I'm also looking at modifications to write the full file single pass
(without seek operations for sizes), it's reasonably expensive to do
all that seeking to establish information we already have.

I'm hoping to finish all this in a few days.

Kristian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to