[jira] Commented: (HADOOP-6837) Support for LZMA compression

Greg Roelofs (JIRA) Fri, 06 Aug 2010 20:25:47 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896213#action_12896213
 ]


Greg Roelofs commented on HADOOP-6837:
--------------------------------------

20100806 version looks pretty good.  The first couple of issues are the main 
ones:

 * FakeOutputStream LinkedList of 1 KB buffers (BLOCKSIZE = 1024):  supposed to 
be 128 KB buffers (package.html)
 * directory structure = unholy mix of 7zip, Hadoop  (expected directory 
structure similar to LZMA SDK tarball contents, e.g.,  
lzma912/Java/SevenZip/Compression/LZMA/Decoder.java and lzma912/C/LzFind.c)
   ** OK to trim some of tarball's levels out (and to be different for Java, 
C), and top level need not be "lzma912" (though it makes connection to download 
more obvious)--I know I suggested "SevenZip", but I thought that appeared in 
both the Java and the C paths:  suggest lzma912 or lzma-9.12 instead
 * kDefaultDictionaryLogSize no longer changed to 16:  OK?
 * apparently bogus files:
   ** src/contrib/SevenZip/ivy/libraries.properties
   ** src/contrib/ec2/bin/hadoop-ec2-env.sh
 * "LZMA SDK" -> "LZMA SDK 9.12" in all boilerplate
 * CRC:  prefer to reuse existing (e.g., PureJavaCrc32); should be compatible
 * LzmaNativeInputStream.java:
   ** "circularwould"
   ** read():  fast busy-loop not thread-friendly...and not necessary:  
read(b[]) (InputStream) blocks until at least 1 byte available--zero returned 
only if b[] has length zero, which is not true of oneByte[]
   ** read():  t unnecessary; just do:  return (int)oneByte[0] & 0xFF;
     or even:  return (ret == -1)? -1 : (int)oneByte[0] & 0xFF;
   ** 1<<13
 * LzmaNativeOutputStream.java:
   ** buffered, sendData -> compressedDirectBuf, uncompressedDirectBuf as above
   ** 1<<16
 * LzmaOutputStream.java:
   ** 1<<17
 * Makefile.am
   ** "All these are setup" -> "All these are set up"
   ** "are also done" -> "is also done"

(Per previous feedback, please change all "1<<N" to the Java equivalent of 
static const int kSomeBufferSize = M * 1024:  easier to read, easier to change 
later.)

Btw, it's always best to follow the existing style consistently than to use 
your own for your changes (with the possible exception of the boilerplate 
comment).  Perhaps Emacs hides the issue, but with 8-space tabs, your changes 
to the contrib LZMA files are a complete mismatch to their style.

Be sure to run "ant javadoc" (and fix any new issues) before the next patch, 
and give "ant test" a shot, too (over the weekend if you happen to see this--it 
takes several hours to run).  I'll work with you to get "ant test-patch" going.


> Support for LZMA compression
> ----------------------------
>
>                 Key: HADOOP-6837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6837
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Carlini
>            Assignee: Nicholas Carlini
>         Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, 
> HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, 
> HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which 
> generally achieves higher compression ratios than both gzip and bzip2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6837) Support for LZMA compression

Reply via email to