On Sat, Jul 10, 2010 at 7:32 AM, Robert Muir <[email protected]> wrote:
> I am curious why we have utility methods to zip compress byte arrays and > strings in the lucene core at all?!... Historical reasons -- compressed binary fields used to be a (ill conceived) feature (pre-exclusive opaque byte array fields). > i think we should get rid of compressionutils completely. > Agreed going forward - they are no longer used in trunk (except in TestBinaryDocument), and the code serves as a bad example for others :). As for LUCENE-772, I guess it can be closed. Hopefully this isn't coming up too often for folks using older releases for whatever reason (time/cost of upgrade). - Luke > > On Sat, Jul 10, 2010 at 1:39 AM, Luke Nezda <[email protected]> wrote: > >> Hello - >> >> Me and a co-worker actually got burned by this code twice in 1 week - once >> via Lucene 2.2 ( >> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_2_0/src/java/org/apache/lucene/index/FieldsReader.java) >> and again via a copy of the same source (moved to >> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/document/CompressionTools.java) >> we were using elsewhere. Due to an edge case misuse of the >> java.util.zip.Inflater API, an empty byte array sends {...@link >> org.apache.lucene.document.CompressionTools#decompress} into a tight, >> infinite loop. >> I realize this is degenerate case, but it happened to us 2 totally >> different ways -- a more robust behavior like returning the input empty >> array, or throwing an exception would be preferable. The following source >> demonstrates the issue and a couple trivial solution alternatives. >> >> I was scanning bugs -- this should solve >> https://issues.apache.org/jira/browse/LUCENE-772 >> >> Kind regards, >> - Luke >> >> import java.util.zip.Inflater; >> import java.util.zip.Deflater; >> import java.util.zip.DataFormatException; >> import java.io.ByteArrayOutputStream; >> >> /** Demonstrates bad empty input edge case behavior of {...@link >> org.apache.lucene.document.CompressionTools#decompress} */ >> class Infiniflater { >> /** Compresses the specified byte range using the >> * specified compressionLevel (constants are defined in >> * java.util.zip.Deflater). */ >> public static byte[] compress(byte[] value, int offset, int length, int >> compressionLevel) { >> /* Create an expandable byte array to hold the compressed data. >> * You cannot use an array that's the same size as the orginal because >> * there is no guarantee that the compressed data will be smaller than >> * the uncompressed data. */ >> ByteArrayOutputStream bos = new ByteArrayOutputStream(length); >> >> Deflater compressor = new Deflater(); >> >> try { >> compressor.setLevel(compressionLevel); >> compressor.setInput(value, offset, length); >> compressor.finish(); >> >> // Compress the data >> final byte[] buf = new byte[1024]; >> while (!compressor.finished()) { >> int count = compressor.deflate(buf); >> bos.write(buf, 0, count); >> } >> } finally { >> compressor.end(); >> } >> >> return bos.toByteArray(); >> } >> >> /** Decompress the byte array previously returned by >> * compress */ >> public static byte[] decompress(byte[] value) throws DataFormatException >> { >> // Create an expandable byte array to hold the decompressed data >> ByteArrayOutputStream bos = new ByteArrayOutputStream(value.length); >> >> Inflater decompressor = new Inflater(); >> >> try { >> decompressor.setInput(value); >> >> // Decompress the data >> final byte[] buf = new byte[1024]; >> //*** original form with potential for inifinite loop below >> //while (!decompressor.finished()) { >> // bos.write(buf, 0, count); >> //} >> //*** simple refined guard condition >> //while (!decompressor.finished() && ! decompressor.needsInput()) { >> // bos.write(buf, 0, count); >> //} >> //*** slightly more efficient (less synchronization (within >> decompressor.needsInput()) for instance) >> while (!decompressor.finished()) { >> int count = decompressor.inflate(buf); >> if (count == 0 && decompressor.needsInput()) { >> break; >> } >> bos.write(buf, 0, count); >> } >> } finally { >> decompressor.end(); >> } >> >> return bos.toByteArray(); >> } >> >> public static void main(String[] args) throws Exception { >> // simplest demonstration of case which will cause infinite loop on >> empty byte array input >> final Inflater decompressor = new Inflater(); >> final byte[] noBytes = new byte[0]; >> decompressor.setInput(noBytes); >> final boolean notFinished = false == decompressor.finished(); >> System.err.println("notFinished?: "+notFinished); >> assert notFinished; >> >> // exercise methods to roundtrip empty array >> // since this "compresses" to 8 bytes, maybe we should just toss >> DataFormatException or IOException >> // if arg to decompress has length < 8 - would be better than current >> infinite loop >> final byte[] compressed = compress(noBytes, 0, noBytes.length, >> Deflater.BEST_COMPRESSION); >> assert compressed.length == 8 : "compressed.length: >> "+compressed.length; >> final byte[] decompressed = decompress(compressed); >> assert decompressed.length == 0 : "decompressed.length: >> "+decompressed.length; >> >> // spin condition >> final byte[] output = decompress(noBytes); >> assert output.length == 0; >> } >> } >> >> > > > -- > Robert Muir > [email protected] >
