Hi Lasse! I'd personally like if you could fill an Issue on Jira and submit your XZ implementation as a patch that naturally fits in the org.apache.commons.compress package and you continue contributing on maintaining it - maybe depending on an external package would be more difficult since commons components generally are self contained and don't depend from any part library - unless are commons components themselves.
Keep what I said strictly as a personal suggestion, I'm not involved in [compress] development so I let maintainers taking decisions. Have a nice day, all the best! Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Wed, Aug 3, 2011 at 9:22 PM, Lasse Collin <lasse.col...@tukaani.org> wrote: > Hi! > > I have been working on XZ data compression implementation in Java > <http://tukaani.org/xz/java.html>. I was told that it could be nice > to get XZ support into Commons Compress. > > I looked at the APIs and code in Commons Compress to see how XZ > support could be added. I was especially looking for details where > one would need to be careful to make different compressors behave > consistently compared to each other. I found a few possible problems > in the existing code: > > (1) CompressorOutputStream should have finish(). Now > BZip2CompressorOutputStream has finish() but > GzipCompressorOutputStream doesn't. This should be easy to > fix because java.util.zip.GZIPOutputStream supports finish(). > > (2) BZip2CompressorOutputStream.flush() calls out.flush() but it > doesn't flush data buffered by BZip2CompressorOutputStream. > Thus not all data written to the Bzip2 stream will be available > in the underlying output stream after flushing. This kind of > flush() implementation doesn't seem very useful. > > GzipCompressorOutputStream.flush() is the default version > from InputStream and thus does nothing. Adding flush() > into GzipCompressorOutputStream is hard because > java.util.zip.GZIPOutputStream and java.util.zip.Deflater don't > support sync flushing before Java 7. To get Gzip flushing in > older Java versions one might need a complete reimplementation > of the Deflate algorithm which isn't necessarily practical. > > (3) BZip2CompressorOutputStream has finalize() that finishes a stream > that hasn't been explicitly finished or closed. This doesn't seem > useful. GzipCompressorOutputStream doesn't have an equivalent > finalize(). > > (4) The decompressor streams don't support concatenated .gz and .bz2 > files. This can be OK when compressed data is used inside another > file format or protocol, but with regular (standalone) .gz and > .bz2 files it is bad to stop after the first compressed stream > and silently ignore the remaining compressed data. > > Fixing this in BZip2CompressorInputStream should be relatively > easy because it stops right after the last byte of the compressed > stream. Fixing GzipCompressorInputStream is harder because the > problem is inherited from java.util.zip.GZIPInputStream > which reads input past the end of the first stream. One > might need to reimplement .gz container support on top of > java.util.zip.InflaterInputStream or java.util.zip.Inflater. > > The XZ compressor supports finish() and flush(). The XZ decompressor > supports concatenated .xz files, but there is also a single-stream > version that behaves similarly to the current version of > BZip2CompressorInputStream. > > Assuming that there will be some interest in adding XZ support into > Commons Compress, is it OK make Commons Compress depend on the XZ > package org.tukaani.xz, or should the XZ code be modified so that > it could be included as an internal part in Commons Compress? I > would prefer depending on org.tukaani.xz because then there is > just one code base to keep up to date. > > -- > Lasse Collin | IRC: Larhzu @ IRCnet & Freenode > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org