[jira] [Reopened] (HADOOP-7206) Integrate Snappy compression

Alejandro Abdelnur (JIRA) Wed, 22 Jun 2011 18:03:12 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alejandro Abdelnur reopened HADOOP-7206:
----------------------------------------


After mulling over this issue a bit more, reading a few times Todd's comment 
and asking around to folks that deal with nativelibs I'm having second thoughts 
about the committed patch based on snappy-java.

The snappy-java approach is tempting because it 'just works' (without having to 
install snappy SO in your system). However, it has a serious drawback; the 
native code is not built in target OS, only on the same architecture. Because 
of this the build is not easy reproducible as there is not knowledge of the OS 
used to build it. In addition, this can lead to not avail dependencies in the 
running OS.

The hadoop-snappy approach has the drawback that it requires an additional step 
(to install snappy SO in the platform), but as benefits it takes care of the 
drawbacks of the snappy-java approach; the native code is built in the target 
OS. Thus, resulting on easy reproducible builds. Furthermore the drawback is 
transient, until snappy is avail the different OSes by default or OS driven 
updates.

A secondary issue is that snappy-java nativelib statically links snappy. As 
snappy SO makes it to standard Linux distributions, snappy-java will use a 
private copy of it instead using the one installed in the OS. On the other 
hand, hadoop-snappy SO dynamically links snappy SO, when snappy SO is available 
in the OS, it could be consumed directly from it. (this could be taken care by 
snappy-java if it changes to dynamically link snappy SO).

Because of this I'd like to revert the snappy-java based patch and go for 
Issay's hadoop-snappy patch.

> Integrate Snappy compression
> ----------------------------
>
>                 Key: HADOOP-7206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7206
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Eli Collins
>            Assignee: Alejandro Abdelnur
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7206-002.patch, HADOOP-7206.patch, 
> v2-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v3-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v4-HADOOP-7206-snappy-codec-using-snappy-java.txt, 
> v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
>
>
> Google release Zippy as an open source (APLv2) project called Snappy 
> (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
> {quote}
> Snappy is a compression/decompression library. It does not aim for maximum 
> compression, or compatibility with any other compression library; instead, it 
> aims for very high speeds and reasonable compression. For instance, compared 
> to the fastest mode of zlib, Snappy is an order of magnitude faster for most 
> inputs, but the resulting compressed files are anywhere from 20% to 100% 
> bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy 
> compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec 
> or more.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HADOOP-7206) Integrate Snappy compression

Reply via email to