[jira] Commented: (HADOOP-6349) Implement FastLZCodec for fastlz/lzo algorithm

Greg Roelofs (JIRA) Fri, 06 Aug 2010 22:47:52 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896222#action_12896222
 ]


Greg Roelofs commented on HADOOP-6349:
--------------------------------------

OK, with respect to the 20100730 patch, the biggest issue by far is the 
buffering, both compression and decompression, as we've discussed in person:

 * since FastLZ is block-based with an uncompressed block size of 64 KB, it 
should never require a buffer larger than 64 KB for either input or output, and 
certainly never the entirety of the file
   ** it sounds like a (relatively) quick fix is possible, but otherwise it 
might be worth looking at SplittableCompressionCodec and/or 
BlockCompressorStream, per comments below

The other items are mostly style-related:

 * move imported code to src/contrib, per offline discussion with Owen O'Malley
 * boilerplate should probably mention original MIT license
 * no src/java/core-default.xml change? (add to io.compression.codecs)
 * [cosmetic: nuke commented-out System.currentTimeMillis(), debug printfs, 
etc.]
 * [cosmetic: strip all trailing blanks on new, non-3rdparty/imported (i.e., 
non-src/contrib) files]
 * prefer to drop "J" prefix on fastlz/*.java (consistent with C, other codecs)
 * add package.html
   ** 64 KB block size
   ** mainly level 2 with fallback to level 1 for partial blocks
   ** weird triple-self-inclusion of fastlz.c
   ** possible future speed/etc. improvements:
     *** use existing Adler-32 in java.util.zip or zlib or wherever?
     *** parallel Adler-32 (and CRC-32) capability in zlib, pigz
 * FastLZCodec.java
   ** originally was going to suggest that it should extend DefaultCodec (like 
GzipCodec) rather than copy it, but as a block-based codec, Splittable*Codec is 
better: we'll file separate JIRA for splittable support later 
(also/alternatively, per Arun Murthy, BlockCompressorStream?)
 * JFastLZ.java
   ** DEFAULT_ARCHIVE_EXTENSION not used
   ** readU32() BigInteger version should be nuked in favor of (fixed) 
readU32Fast() (not used much in any case, but...ewww)
   ** why isn't all of "Moved." block (line 686) commented out/deleted?
   ** invert "Never runs b/c" block so true path is first
     *** does !FASTLZ_STRICT_ALIGN option even work? if not, nuke altogether
   ** no need for double parens on while condition (line 768)
 * JFastLZCompressor.java
   ** lose empty else in static block
   ** "v. low" -> "very low", "b/c" -> "because", etc. (in general, avoid 
abbreviations in comments; English isn't everyone's primary language)
   ** had hoped b/iBuff arraycopy ("so CompressorStream doesn't muck w/ our 
data") was unnecessary, but doesn't look like it: b[] comes straight from user 
via write() call
   ** "XXX" ?
   ** growOutputBuffer() used for apparently tiny deltas: should simply double 
size (unless strong statistical or theoretical arg why small increments are 
rare) [this is subsumed by whole fix-the-buffering issue above]
   ** use "L" for long-constant appendage: lowercase looks too much like "1"
   ** [cosmetic: need spaces on outside of for-loop and if-conditional parens]
   ** why isn't "if (totalCompressed == 26)" block (line 356) deleted?
   ** "enought"
   ** looks like native checksum call is wasted if incompressible data: should 
be moved down to other half of "if (!nativeLoaded)" conditional (line 416) 
(more symmetric that way, too)
   ** [cosmetic: prefer variables at top of class (SAFE_PACK)]
 * JFastLZDecompressor.java
   ** [cosmetic: wrap long lines (native methods)]
   ** lose empty else in static block
   ** "v." -> "very", etc. (as above)
   ** DEFAULT_BLOCK_SIZE hardcoded at 64K ... configured value completely 
ignored (OK, probably related to "native" FastLZ block size...never mind)

Btw, this one is clearly much, much simpler than the LZMA patch, so if it comes 
down to one or the other, please concentrate on LZMA. If necessary, I (or 
someone) can pick this up and finish it without much trouble, I think.

> Implement FastLZCodec for fastlz/lzo algorithm
> ----------------------------------------------
>
>                 Key: HADOOP-6349
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6349
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: William Kinney
>         Attachments: hadoop-6349-1.patch, hadoop-6349-2.patch, 
> hadoop-6349-3.patch, HADOOP-6349-TestFastLZCodec.patch, HADOOP-6349.patch, 
> TestCodecPerformance.java, TestCodecPerformance.java, testCodecPerfResults.tsv
>
>
> Per  [HADOOP-4874|http://issues.apache.org/jira/browse/HADOOP-4874], FastLZ 
> is a good (speed, license) alternative to LZO. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6349) Implement FastLZCodec for fastlz/lzo algorithm

Reply via email to