[compress] 2.0: Reading and Writing Archives

Stefan Bodewig Wed, 08 Jan 2014 06:54:21 -0800

Hi,

putting the exact representation of an archive entry aside I've put down
an idea of the API for reading and writing archives together with a POC
port of the AR classes for this API.  All is inside
http://svn.apache.org/repos/asf/commons/proper/compress/branches/compress-2.0/


The port doesn't look pretty but I wanted to get there quickly and
change as little as possible, partly to see how much effort porting the
existing code base would be.  In particular I copied IOUtils into the AR
package so I don't have to thing about a proper package right now.  I
also didn't care about Java < 7 so far.

Please have a look (more on the interfaces than the actual
implementation) and show me how wrong I am :-)

Some points I'd like to highlight and discuss:

* ArchiveInput and ArchiveOutput are not Streams (or Channels)
  themselves

  This is unline Archive*Stream in 1.x

  Emmanuel brought this up in a chat between the two of us and I agreed
  with him.  You don't really use them as a stream but rather as a
  stream per entry.

  For Compressor* I'd still wrap streams/channels, different issue.

* Using Channels rather than Streams

  I'm a bit torn about this.  I did so because I'd prefer to base
  ZipFile and friends on SeekableByteStream rather than RandomAccessFile
  - so it would make the API look more symmetric.

  Drawbacks I've already found

  - no skip in ReadableByteChannel so you are forced to read data even
    if something more efficient could be done.  This smells like another
    IOUtils method.

  - worse, no mark/reset or pushback, this is going to make format
    detection uglier as we have to rewind the channel in a different way

  Another concern might be that Compress 2.0 might get delayed because
  proting effort was bigger - I've deliberately taken the Channels.new*
  route to wrap the existing stream based API in ArArchiveInput and it
  seems to work (although likely is suboptimal).  Going all-in on
  Channels in ArArchiveOutput didn't look much more difficult either,
  but the I/O part of output is simpler anyway.

* Checked vs Unchecked exceptions

  I would love to make ArchiveInput be an Iterator over the entries but
  can't do so as the things we'd need to do in next() might throw an
  IOException.  One option may be to introduce an unchecked
  ArchiveException and wrap al checked exceptions (and do so throughout
  the API).

* RandomAccessArchiveInput as a generalization of ZipFile

  This extends ArchiveInput so if you ask for an ArchiveInput to a file
  and the format doesn't support a stream-like interface (like 7z) you
  can still obtain one.  This is helped a lot by the fact that
  ArchiveInput is not a stream itself.

* I'm not sure about ArchiveInput#getChannel

  Should next return a Pair of ArchiveEntry and Channel instead?

* tiny change to the contract of ArchiveOutput finish

  finish used to throw an exception if you didn't call closeEntry for
  the last entry while putEntry closes the previous entry.  This looked
  inconsistent and finish now silently closes the entry as well.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[compress] 2.0: Reading and Writing Archives

Reply via email to