[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896213#action_12896213 ]
Greg Roelofs commented on HADOOP-6837: -------------------------------------- 20100806 version looks pretty good. The first couple of issues are the main ones: * FakeOutputStream LinkedList of 1 KB buffers (BLOCKSIZE = 1024): supposed to be 128 KB buffers (package.html) * directory structure = unholy mix of 7zip, Hadoop (expected directory structure similar to LZMA SDK tarball contents, e.g., lzma912/Java/SevenZip/Compression/LZMA/Decoder.java and lzma912/C/LzFind.c) ** OK to trim some of tarball's levels out (and to be different for Java, C), and top level need not be "lzma912" (though it makes connection to download more obvious)--I know I suggested "SevenZip", but I thought that appeared in both the Java and the C paths: suggest lzma912 or lzma-9.12 instead * kDefaultDictionaryLogSize no longer changed to 16: OK? * apparently bogus files: ** src/contrib/SevenZip/ivy/libraries.properties ** src/contrib/ec2/bin/hadoop-ec2-env.sh * "LZMA SDK" -> "LZMA SDK 9.12" in all boilerplate * CRC: prefer to reuse existing (e.g., PureJavaCrc32); should be compatible * LzmaNativeInputStream.java: ** "circularwould" ** read(): fast busy-loop not thread-friendly...and not necessary: read(b[]) (InputStream) blocks until at least 1 byte available--zero returned only if b[] has length zero, which is not true of oneByte[] ** read(): t unnecessary; just do: return (int)oneByte[0] & 0xFF; or even: return (ret == -1)? -1 : (int)oneByte[0] & 0xFF; ** 1<<13 * LzmaNativeOutputStream.java: ** buffered, sendData -> compressedDirectBuf, uncompressedDirectBuf as above ** 1<<16 * LzmaOutputStream.java: ** 1<<17 * Makefile.am ** "All these are setup" -> "All these are set up" ** "are also done" -> "is also done" (Per previous feedback, please change all "1<<N" to the Java equivalent of static const int kSomeBufferSize = M * 1024: easier to read, easier to change later.) Btw, it's always best to follow the existing style consistently than to use your own for your changes (with the possible exception of the boilerplate comment). Perhaps Emacs hides the issue, but with 8-space tabs, your changes to the contrib LZMA files are a complete mismatch to their style. Be sure to run "ant javadoc" (and fix any new issues) before the next patch, and give "ant test" a shot, too (over the weekend if you happen to see this--it takes several hours to run). I'll work with you to get "ant test-patch" going. > Support for LZMA compression > ---------------------------- > > Key: HADOOP-6837 > URL: https://issues.apache.org/jira/browse/HADOOP-6837 > Project: Hadoop Common > Issue Type: Improvement > Components: io > Reporter: Nicholas Carlini > Assignee: Nicholas Carlini > Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, > HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, > HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch > > > Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which > generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.