[jira] [Commented] (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687660#comment-13687660 ] Joydeep Sen Sarma commented on HADOOP-6837: --- yes - the fb-hadoop tree has a working implementation. most of the original code came from Baidu. we tried to convert many petabytes to lzma. (switching from gzip compressed rcfile to lzma compressed). aside from speed issues (writes are very slow in spite of trying our best to fiddle around with different lzma settings directly in code) - the problem is we got rare corruptions every once in a while. these didn't seem to have anything to do with hadoop code - but the lzma codec itself. certain blocks would be unreadable. we had to abandon the conversion project at that point. my gut is that for small scale uses - the lzma stuff as implemented in fb-hadoop-20 works. across petabytes of data - where every rcfile block (1MB) has multiple compressed streams (1 per column) - and we are literally opening and closing billions of compressed streams - there are latent bugs in lzma (that were well beyond our capability to debug - leave alone reproduce accurately). we never had the same issues with gzip obviously (so the problem cannot be hadoop components like HDFS). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687627#comment-13687627 ] Harsh J commented on HADOOP-6837: - FB's hadoop-0.20 seems to have a working implementation of this, although I do not know results of its stability yet: https://github.com/facebook/hadoop-20/blob/master/src/core/org/apache/hadoop/io/compress/LzmaCodec.java (and others). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929540#action_12929540 ] Erik Forsberg commented on HADOOP-6837: --- Any progress on getting the new patch based on liblzma ready? Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915938#action_12915938 ] Joydeep Sen Sarma commented on HADOOP-6837: --- thanks to everyone on getting lzma into hadoop. it seems to be very promising. i have tried applying the latest patch to both hadoop-0.20 (yahoo/facebook branch) and common- trunk. in both cases - when i try running TestCodec after compiling the native codec - i get a sigsegv: [junit] Running org.apache.hadoop.io.compress.TestCodec [junit] # [junit] # An unexpected error has been detected by Java Runtime Environment: [junit] # [junit] # SIGSEGV (0xb) at pc=0x2aaad5215659, pid=16028, tid=1076017472 [junit] # [junit] # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b23 mixed mode linux-amd64) [junit] # Problematic frame: [junit] # C [libhadoop.so.1.0.0+0x5659] thisRead+0x49 [junit] # separate from this - i had a question about tuning the compression level. in my testing on internal data using the lzma utility built from the SDK - i found a bunch of interesting option that provided a more suitable compromise between compression ratio/cpu (-a0 -mfhc4 -d24 -fbxxx) than the default. eyeing the 'level' based normalization - it seems i won't be able to quite achieve the settings i want by specifying a level. so it seems that being able to configure these options separately would be very useful. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897277#action_12897277 ] Pelle Nilsson commented on HADOOP-6837: --- Do I read these comments correctly that LZMA2/xz is not included in the current patch, and might not be included as part of this issue since the LZMA Java lib does not support it? Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897391#action_12897391 ] Hong Tang commented on HADOOP-6837: --- Nicholas's has done some interesting works. But unfortunately I will -1 for marking it patch available. The currently patch carries a modified version of LZMA SDK. This is a huge maintenance overhead going forward where a much simpler solution clearly exists. We should explore the liblzma route first as I mentioned in http://bit.ly/cDz2Pk. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897400#action_12897400 ] Jakob Homan commented on HADOOP-6837: - It may be good to address the code style issue now, since this patch diverges significantly from our standard: http://wiki.apache.org/hadoop/CodeReviewChecklist Eclipse can re-format everything into Hadoop's style pretty well, if that will save time. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897512#action_12897512 ] Greg Roelofs commented on HADOOP-6837: -- bq. The currently patch carries a modified version of LZMA SDK. This is a huge maintenance overhead going forward where a much simpler solution clearly exists. If the modifications are rolled into the SDK, this issue goes away. Nicholas, can you create a current diff of the src/contrib part relative to lzma912's original path structure (i.e., so it applies cleanly to a stock lzma912 codebase)? Then we can send it off to the 7Zip folks and see if they're willing to incorporate it. (And only partly exists. There's no Java in liblzma. On the other hand, consensus around here seems to be that built-in Java support isn't necessary.) bq. It may be good to address the code style issue now, since this patch diverges significantly from our standard Only the src/contrib portion does, and that was intentional. bzip2 is no longer actively developed, so an in-tree, heavily modified port is no big deal. LZMA, however, is still a very active project, and if we ever wanted to upgrade to a newer release (e.g., for performance or correctness reasons), we do _not_ want a lot of whitespace noise hiding the real diffs. But this issue also largely disappears if the substantive modifications are accepted upstream; then the formatting is fairly irrelevant, though still a pain for diffs and patches. Either way, I don't think style rules are or should necessarily be applicable to contrib code (in the outside-the-core-codebase sense of contrib). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-4-20100811.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896739#action_12896739 ] Greg Roelofs commented on HADOOP-6837: -- bq. FakeOutputStream isn't the one I'm talking about in package.html. That's for the OutputStream/ FakeInputStream. FakeOutputStream is just the one where I couldn't justify the maximum acting correctly (wrtiting a max of 273 bytes extra) so I added the linked list in case anything goes wrong. Argh, yes, of course...we discussed that at least twice already. Sorry for spacing. Did you ever instrument it to emit a warning if it did go above 273 extra bytes? Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896751#action_12896751 ] Nicholas Carlini commented on HADOOP-6837: -- I did. The average number of overflow bytes is 24. I never saw it go above 120. A quick sed/dc script tells me the standard deviation is 18. So I'm fairly sure that I am correct in that it will never go above 273. Trying with setting the number of fast bytes to 273 gives average of 37 and standard deviation of 26. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-3-20100809.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896213#action_12896213 ] Greg Roelofs commented on HADOOP-6837: -- 20100806 version looks pretty good. The first couple of issues are the main ones: * FakeOutputStream LinkedList of 1 KB buffers (BLOCKSIZE = 1024): supposed to be 128 KB buffers (package.html) * directory structure = unholy mix of 7zip, Hadoop (expected directory structure similar to LZMA SDK tarball contents, e.g., lzma912/Java/SevenZip/Compression/LZMA/Decoder.java and lzma912/C/LzFind.c) ** OK to trim some of tarball's levels out (and to be different for Java, C), and top level need not be lzma912 (though it makes connection to download more obvious)--I know I suggested SevenZip, but I thought that appeared in both the Java and the C paths: suggest lzma912 or lzma-9.12 instead * kDefaultDictionaryLogSize no longer changed to 16: OK? * apparently bogus files: ** src/contrib/SevenZip/ivy/libraries.properties ** src/contrib/ec2/bin/hadoop-ec2-env.sh * LZMA SDK - LZMA SDK 9.12 in all boilerplate * CRC: prefer to reuse existing (e.g., PureJavaCrc32); should be compatible * LzmaNativeInputStream.java: ** circularwould ** read(): fast busy-loop not thread-friendly...and not necessary: read(b[]) (InputStream) blocks until at least 1 byte available--zero returned only if b[] has length zero, which is not true of oneByte[] ** read(): t unnecessary; just do: return (int)oneByte[0] 0xFF; or even: return (ret == -1)? -1 : (int)oneByte[0] 0xFF; ** 113 * LzmaNativeOutputStream.java: ** buffered, sendData - compressedDirectBuf, uncompressedDirectBuf as above ** 116 * LzmaOutputStream.java: ** 117 * Makefile.am ** All these are setup - All these are set up ** are also done - is also done (Per previous feedback, please change all 1N to the Java equivalent of static const int kSomeBufferSize = M * 1024: easier to read, easier to change later.) Btw, it's always best to follow the existing style consistently than to use your own for your changes (with the possible exception of the boilerplate comment). Perhaps Emacs hides the issue, but with 8-space tabs, your changes to the contrib LZMA files are a complete mismatch to their style. Be sure to run ant javadoc (and fix any new issues) before the next patch, and give ant test a shot, too (over the weekend if you happen to see this--it takes several hours to run). I'll work with you to get ant test-patch going. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896214#action_12896214 ] Greg Roelofs commented on HADOOP-6837: -- On second thought, _if_ you put the version into the src/contrib path as I suggested, there's no need to add it to the boilerplate text, too. That will make future forward-ports simpler (i.e., they can use the same boilerplate text). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-2-20100806.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894773#action_12894773 ] Nicholas Carlini commented on HADOOP-6837: -- Responding to the major comments -- will upload a patch that fixes these and the smaller comments soon. FakeInputStream LinkedList: This LinkedList can get fairly long, depending on how write is called. Worst case it can have upwards of 12 million elements, which is far beyond acceptable. This is the case if the write(single_byte) is called over and over. Each call will add a new link. Looking back at this, linked list probably wasn't the best way to go. There are two (obvious) ways that write() could have worked. One is using linked lists as I did. The other way would be to create a byte array that can hold forceWriteLen bytes and just copy into it; however this can be as large as 12MB. There are then two other ways to make this work. The first is just allocating the 12MB right up front. The other way is to start it with maybe just 64k, and make it grow (by powers of two) until it reaches 12MB, however this would end up arraycopying a little under 12MB in total more than the other solution. I will implement one of these for the patch. FakeOutputStream LinkedList: This linked list has a more reasonable use. Its purpose is to hold extra bytes just in case the input stream gives too many. I am fairly confident that at most 272 bytes (maximum number of fast bytes - 1) can be written to it. The reason I used a linked list, however, is that I couldn't formally prove this after going through code. I wanted to be safe and just in case their code doesn't behave as it should, everything will work on the OutputStream end. Code(..., len) I think I remember figuring out that Code(...) will return at least (but possibly more than) len bytes with the one exception that when the end of the stream is reached it will only read up to the end of the stream. I will modify the decompressor to no longer assume this and use the actual number of bytes read instead. Fixed the inStream.read() bug (and will be in patch I upload). Added a while loop to read until EOF is reached so the assumptions are true. Tail call recursive methods - while loop. Java should add tail-call optimizations when methods only call themselves recursively (which would require no changes to the bytecode). Fixed memory leaks. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894246#action_12894246 ] Greg Roelofs commented on HADOOP-6837: -- First, apologies for having steered Nicholas wrong on the liblzma issue. As Hong noted, it provides a much saner (that is, zlib-like) API for this sort of thing, but I mistakenly thought it shared the GPL license of (parts of) xz, so we ignored it and he worked on the LZMA SDK code instead. (The latter did include a Java port, however; liblzma does not.) Overall, the 20100722 patch looks pretty decent (given the starting material), but it does include some less-than-beautiful workarounds to cope with the impedance mismatch between push- and pull-style I/O models. In light of the fact that liblzma is, in fact, public-domain software (every file under xz-4.999.9beta-143-g3e49/src/liblzma is either explicitly in the public domain or has been automatically generated by such a file), I'm going to ask that Nicholas redo the native-code version to use liblzma rather than the SDK. (Unfortunately, it looks like the transformation from C SDK to liblzma was a significant amount of work, so it doesn't appear that a trivial liblzma- ification of the Java SDK code is likely. If Nicholas concurs with that assessment, we can instead file a separate JIRA to port the liblzma C code to Java.) Note that liblzma includes an LZMA2 codec, so Scott Carey's splittable-codec suggestion is within reach, too. OK, enough preamble. There were a number of relatively minor style issues, which I'll simply list below, but the five main concerns were: - FakeInputStream.java, FakeOutputStream.java: the linked lists of byte arrays are tough to swallow, even given the push/pull problem, even given our previous discussions on the matter. It would be good to know what the stats are on these things in typical cases--how frequently does overflow occur in LzmaInputStream, for example, and how many buffers are used? - Is the Code(..., len) call in LzmaInputStream guaranteed to produce len bytes if it returns true? The calling read() function assumes this, but it's not at all obvious to me; the parameter is outSize in Code(), and I don't see that it's decremented or even really used at all (other than being stored in oldOutSize), unless it's buried inside macros defined elsewhere. The next two (or perhaps three) are no longer directly relevant, but they're general things to watch out for: - The return value from inStream.read() in LzmaNativeInputStream.java is ignored even though there's no guarantee the call will return the requested number of bytes. A comment (never have to call ... again) reiterates this error. - There's no need for recursion in LzmaNativeOutputStream.java's write() method; iterative solutions tend to be far cleaner, I think. (Probably ditto for LzmaNativeInputStream.java's read() method.) - LzmaCompressor.c has a pair of memleaks (state-outStream, state-inStream). Here are the minor readability/maintainability/cosmetic/style issues: * add LZMA SDK version (apparently 9.12) and perhaps its release date to the boilerplate * tabs/formatting of LZMA SDK code (CRC.java, BinTree, etc.): I _think_ tabs are frowned upon in Hadoop, though I might be wrong; at any rate, they seem to be rarely used ** for easy Hadoop-style formatting, indent -i2 -br -l80 is a start (though it's sometimes confused by Java/C++ constructs) * reuse existing CRC implementation(s)? (JDK has one, Hadoop has another) * prefer lowercase lzma for subdirs * use uppercase LZMA when referring to codec/algorithm (e.g., comments) * add README mentioning anything special/weird/etc. (e.g., weird hashtable issue); summary of changes made for Hadoop; potential Java/C diffs; binary compatibility between various output formats (other than trivial headers/ footers); LZMA2 (splittable, not yet implemented); liblzma (much cleaner, more zlib-like implementation, still PD); etc. * ant javadoc run yet? (apparently not recently) * line lengths, particularly comments (LzmaInputStream.java, etc.): should be no more than 80 columns in general (Hadoop guidelines) * avoid generic variable names for globals and class members; use existing conventions where possible (e.g., look at gzip/zlib and bzip2 code) * LzmaCodec.java: ** uppercase LZMA when referring to codec/algorithm in general ** funcionality x 4 ** throws ... { continuation line: don't further indent * LZ/InWindow.java ** leftover debug code at end * RangeCoder/Encoder.java ** spurious blank line (else just boilerplate) * FakeOutputStream.java: ** stuffeed ** ammount ** isOverflow() - didOverflow() * LzmaInputStream.java: ** [uses FakeOutputStream] ** bufferd ** we 've ** index - overflowIndex (or similar): too generic **
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892425#action_12892425 ] Nicholas Carlini commented on HADOOP-6837: -- ... that was supposed to go on HADOOP-6349, not here. Ignore that. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: hadoop-6349-2.patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890043#action_12890043 ] Hong Tang commented on HADOOP-6837: --- @nicolas, per our offline conversation last week, have you looked into whether the licensing of liblzma is suitable for inclusion in Hadoop? Liblzma seems better in the sense that its API resembles closely the APIs of other compression libraries like bzip or zlib and should shrink the amount of coding work needed to support C (and Java over JNI). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890051#action_12890051 ] Nicholas Carlini commented on HADOOP-6837: -- I spoke with Greg about it just now and he said it would probably be better for me to work on FastLZ first, and come back to doing that latter. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883630#action_12883630 ] Scott Carey commented on HADOOP-6837: - What happens if you compress a tarball of those files instead? Here are my results, on a directory with 4.1GB of ~64MB files. The content is mixed binary/text (key/value data, binary keys, mixed binary/text values). This is on CentOS 5.5, with the 'xz' and 'bzip2' packages installed via yum. Compression / decompression speed. Disk is capable of 200MB/sec read/write, 16GB RAM, Nehalem based processor (Xeon E5620, 2.4Ghz). Tests confirmed to be CPU bound with no iowait. measurements are in MB/sec for the uncompressed data. Source tarball, 4130 MB (100%) || type | compressed size | compressed size as percent | time to compress | compression rate | time to decompress | decompression rate | |gzip -1| 1430MB | (34.6%)| 105 s| (39.3 MB/sec)| 42 s | 98.3 MB/sec | |gzip -6| 1240MB | (30.0%)| 251 s| (16.5 MB/sec)| 41.5s | 99.5 MB/sec | |bzip2 -2| 1003MB | (24.3%)| 656 s| (6.3 MB/sec)| 168 s | 24.6 MB/sec | |bzip2 -6| 942MB | (22.8%)| 725 s| (5.7 MB/sec)| 176 s | 23.5 MB/sec | |bzip2 -9| 926MB | (22.4%)| 763 s| (5.4 MB/sec)| 181 s | 22.8 MB/sec | |xz -2| 993MB | (24.0%)| 429 s| (9.63 MB/sec)| 95s | 43.5 MB/sec | |xz -6| 794MB | (19.2%)| 2861 s| (1.44 MB/sec)| 83s | 49.7 MB/sec | Note that on today's newest processors, gzip decompresses at gigabit ethernet speeds. xz is half that, and bzip2 about half that again. Gzip ane zx decompress faster at higher compression ratios, bzip2 decompresses slower at higher ratios. All compress slower the higher the ratio, but bzip2 only slows down by ~20% or so from the fast to slow settings, while gzip and xz slow down by a factor of 10+ (I did not do -9 tests here for those, they are very slow). IMO, since xz-2 is almost 2x as fast at compression and decompression as bzip2, and similar in compression ratio, it leaves little room for bzip2's use. At higher compression levels, xz is very slow to compress, but achieves compression ratios significantly better than anything else and still decompresses very fast, so its great for archival storage. For faster compression, gzip -1 or lzo and other compression types without an entropy coder are the only options. The link I provided above has several cases where xz is 3 or more times faster than bzip2 at decompression, but my data doesn't behave that way. Raw Data: $ time cat packed.tar | gzip -c1 packed.gz1 real1m44.938s user1m42.200s sys 0m5.300s $ time cat packed.tar | gzip -c6 packed.gz6 real4m11.051s user4m8.438s sys 0m5.317s $ time cat packed.tar | bzip2 -2 packed.bz2-2 real10m55.795s user10m52.989s sys 0m5.030s $ time cat packed.tar | bzip2 -6 packed.bz2-6 real12m4.847s user12m2.049s sys 0m5.345s $ time cat packed.tar | bzip2 -9 packed.bz2-9 real12m43.063s user12m40.353s sys 0m4.797s $ time cat packed.tar | xz -zv -2 - packed.xz 100.0 % 991.1 MiB / 4,125.0 MiB = 0.240 9.6 MiB/s 7:09 real7m9.369s user7m6.985s sys 0m7.140s $ time cat packed.tar | xz -zv -6 - packed.xz6 100.0 % 792.6 MiB / 4,125.0 MiB = 0.192 1.4 MiB/s47:41 real47m41.033s user47m37.794s sys 0m8.371s -- Tests of decompression: $ time cat packed.gz1 | gunzip /dev/null real0m42.081s user0m41.814s sys 0m1.361s $ time cat packed.gz6 | gunzip /dev/null real0m41.512s user0m41.021s sys 0m1.086s $ time cat packed.bz2-2 | bunzip2 /dev/null real2m48.528s user2m48.014s sys 0m1.455s $ time cat packed.bz2-6 | bunzip2 /dev/null real2m56.511s user2m55.999s sys 0m1.302s $ time cat packed.bz2-9 | bunzip2 /dev/null real3m1.064s user3m0.559s sys 0m1.409s $ time cat packed.xz | xz -dc /dev/null real1m35.239s user1m34.873s sys 0m1.301s $ time cat packed.xz6 | xz -dc /dev/null real1m23.219s user1m22.771s sys 0m1.126s Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883265#action_12883265 ] Greg Roelofs commented on HADOOP-6837: -- Scott Carey wrote: bq. lzma always decompresses 2 to 7 times as fast as bzip2 (only ~ half the decompression speed of gzip). I didn't see that in my tests. My measurements (last column) are in terms of compressed MB/sec, i.e., scaled by the compression ratio, but the ratios are close enough that that isn't a big factor: {noformat} bzip2-1: text = 78.9% (1.1), 1.464 (0.028) ucMB/sec, 1.189 (0.037) cMB/sec bin = 50.1% (3.4), 1.395 (0.021) ucMB/sec, 2.170 (0.036) cMB/sec bzip2-9: text = 80.5% (1.0), 1.415 (0.028) ucMB/sec, 1.135 (0.037) cMB/sec bin = 51.6% (3.6), 1.340 (0.020) ucMB/sec, 1.878 (0.032) cMB/sec xz-1:text = 79.6% (1.0), 2.705 (0.097) ucMB/sec, 1.457 (0.049) cMB/sec bin = 53.3% (3.5), 1.820 (0.031) ucMB/sec, 2.93 (0.20) cMB/sec xz-9:text = 82.4% (0.8), 0.240 (0.011) ucMB/sec, 1.433 (0.051) cMB/sec bin = 57.2% (3.6), 0.351 (0.010) ucMB/sec, 2.73 (0.17) cMB/sec {noformat} So xz/LZMA is definitely faster to decompress, but not immensely so. (This was all C code. The text and bin measurements are averages across roughly 350 files of each type, various sizes. Not a perfect corpus, but it should be varied enough to draw some reasonable conclusions. On the other hand, the file sizes are definitely much smaller than is typical in Hadoop jobs.) Btw, I didn't see Nicholas mention it, but all of the LZMA variants he tested appear to be stream-compatible--that is, any of the tools can decompress any of the others' streams, possibly modulo some header-parsing. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882214#action_12882214 ] Scott Carey commented on HADOOP-6837: - Isn't there a new variant of LZMA (file extension xz) that uses LZMA2 and is block based (and therefore splittable)? We should definitely make sure that is the variant we want to support. LZMA is slower than gzip, but compresses better than both bzip2 and gzip. It is also optimized for fast decompression -- it decompresses significantly faster than bzip2 (but not as fast as gzip). This link is useful for understanding the performance / compression ratio differences across the various compression levels provided for each: http://tukaani.org/lzma/benchmarks.html LZO, FastLZ, LZF, and the like are all faster than the above three but compress at a lower ratio. With LZMA support (hopefully .xz files, not the older 7zip) there is little reason to use bzip2 anymore -- lzma level 2 compresses as fast as bzip2 level 1, but has a compression ratio as high as bzip2 level 9. lzma always decompresses 2 to 7 times as fast as bzip2 (only ~ half the decompression speed of gzip). It is the ideal archival storage format. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882250#action_12882250 ] Nicholas Carlini commented on HADOOP-6837: -- The Java code from the SDK hasn't been updated since version 4.61 (which is as of 23 November, 2008), so support for LZMA2 would need to rely on C code, or be ported to Java. The compression ratios of LZMA and LZMA2 are nearly identical (+/- .01% from the tests I did). It does look like LZMA2 is block based and is splittable, so that would be a major plus for it. On the differences between LZMA and LZMA2: nbsp; nbsp; nbsp; nbsp; nbsp; LZMA2 is an extension on top of the original LZMA. LZMA2 uses nbsp; nbsp; nbsp; nbsp; nbsp; LZMA internally, but adds support for flushing the encoder, nbsp; nbsp; nbsp; nbsp; nbsp; uncompressed chunks, eases stateful decoder implementations, nbsp; nbsp; nbsp; nbsp; nbsp; and improves support for multithreading. http://tukaani.org/xz/xz-file-format.txt I did have to add support for flushing the encoder to the Java code (flushing the encoder still produces valid lzma-compressed output). Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881915#action_12881915 ] Allen Wittenauer commented on HADOOP-6837: -- The 7z SDK license is Public Domain and 7z LZMA is LGPL. Is that compatible with the APL? Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881928#action_12881928 ] Greg Roelofs commented on HADOOP-6837: -- 7-Zip is LGPL; the LZMA SDK is not: License LZMA SDK is placed in the public domain. Given that both packages are hosted at the same site, with links to each other on the left bar, I think we can safely assume they know the difference between the two and have made a conscious decision to release them accordingly. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881929#action_12881929 ] Eli Collins commented on HADOOP-6837: - Hey Nicholas, Cool stuff. Unfortunately LGPL in incompatible with APL so we couldn't check this in. See more at http://www.apache.org/legal/resolved.html bq. The LGPL is ineligible primarily due to the restrictions it places on larger works, violating the third license criterion. Therefore, LGPL-licensed works must not be included in Apache products Do you need to use this particular codec or are you just looking for something better than gzip/bzip2? If the latter HADOOP-6349 (support for FastLZ) would be a great place to direct your efforts, it's got a compatible license and like LZMA is significantly faster than gzip/bzip (and faster than the open source version of lzo). Thanks, Eli Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881931#action_12881931 ] Eli Collins commented on HADOOP-6837: - If LZMA is public domain then it should safe to include. Would be good to have clarification from the author. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881934#action_12881934 ] Greg Roelofs commented on HADOOP-6837: -- LZMA is not faster than gzip/bzip2; it compresses better. FastLZ (next item on Nicholas's plate) is faster than LZO but compresses more poorly than everything else (except maybe LZW). They're both useful, but they address different parts of the problem domain. Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881932#action_12881932 ] Greg Roelofs commented on HADOOP-6837: -- See the last line of their FAQ. ;-) Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881935#action_12881935 ] Nicholas Carlini commented on HADOOP-6837: -- Per the FAQ: You can also read about the LZMA SDK, which is available under a more liberal license. http://www.7-zip.org/faq.html Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.