Tony,

snappy is also available:
http://code.google.com/p/hadoop-snappy/

best,
 Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 9, 2012, at 8:49 AM, Harsh J wrote:

> Tony,
> 
> * Yeah, SequenceFiles aren't human-readable, but "fs -text" can read it out 
> (instead of a plain "fs -cat"). But if you are gonna export your files into a 
> system you do not have much control over, probably best to have the resultant 
> files not be in SequenceFile/Avro-DataFile format.
> * Intermediate (M-to-R) files use a custom IFile format these days, which is 
> built purely for that purpose.
> * Hive can use SequenceFiles very well. There is also documented info on this 
> in the Hive's wiki pages (Check the DDL pages, IIRC).
> 
> On 09-Jan-2012, at 9:44 PM, Tony Burton wrote:
> 
>> Thanks for the quick reply and the clarification about the documentation.
>> 
>> Regarding sequence files: am I right in thinking that they're a good choice 
>> for intermediate steps in chained MR jobs, or for file transfer between the 
>> Map and the Reduce phases of a job; but they shouldn't be used for 
>> human-readable files at the end of one or more MapReduce jobs? How about if 
>> the only use a job's output is analysis via Hive - can Hive create tables 
>> from sequence files? 
>> 
>> Tony
>> 
>> 
>> 
>> -----Original Message-----
>> From: Harsh J [mailto:ha...@cloudera.com] 
>> Sent: 09 January 2012 15:34
>> To: common-user@hadoop.apache.org
>> Subject: Re: has bzip2 compression been deprecated?
>> 
>> Bzip2 is pretty slow. You probably do not want to use it, even if it does 
>> file splits (a feature not available in the stable line of 0.20.x/1.x, but 
>> available in 0.22+).
>> 
>> To answer your question though, bzip2 was removed from that document cause 
>> it isn't a native library (its pure Java). I think bzip2 was added earlier 
>> due to an oversight, as even 0.20 did not have a native bzip2 library. This 
>> change in docs does not mean that BZip2 is deprecated -- it is still fully 
>> supported and available in the trunk as well. See 
>> https://issues.apache.org/jira/browse/HADOOP-6292 for the doc update changes 
>> that led to this.
>> 
>> The best way would be to use either:
>> 
>> (a) Hadoop sequence files with any compression codec of choice (best would 
>> be lzo, gz, maybe even snappy). This file format is built for HDFS and MR 
>> and is splittable. Another choice would be Avro DataFiles from the Apache 
>> Avro project.
>> (b) LZO codecs for Hadoop, via https://github.com/toddlipcon/hadoop-lzo (and 
>> hadoop-lzo-packager for packages). This requires you to run indexing 
>> operations before the .lzo can be made splittable, but works great with this 
>> extra step added.
>> 
>> On 09-Jan-2012, at 7:17 PM, Tony Burton wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to work out which compression algorithm I should be using in my 
>>> MapReduce jobs.  It seems to me that the best solution is a compromise 
>>> between speed, efficiency and splittability. The only compression algorithm 
>>> to handle file splits (according to Hadoop: The Definitive Guide 2nd 
>>> edition p78 etc) is bzip2, at the expense of compression speed.
>>> 
>>> However, I see from the documentation at 
>>> http://hadoop.apache.org/common/docs/current/native_libraries.html that the 
>>> bzip2 library is no longer mentioned, and hasn't been since version 0.20.0, 
>>> see http://hadoop.apache.org/common/docs/r0.20.0/native_libraries.html - 
>>> however the bzip2 Codec is still in the API at 
>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/BZip2Codec.html.
>>> 
>>> Has bzip2 support been removed from Hadoop, or will it be removed soon?
>>> 
>>> Thanks,
>>> 
>>> Tony
>>> 
>>> 
>>> 
>>> **********************************************************************
>>> This email and any attachments are confidential, protected by copyright and 
>>> may be legally privileged.  If you are not the intended recipient, then the 
>>> dissemination or copying of this email is prohibited. If you have received 
>>> this in error, please notify the sender by replying by email and then 
>>> delete the email completely from your system.  Neither Sporting Index nor 
>>> the sender accepts responsibility for any virus, or any other defect which 
>>> might affect any computer or IT system into which the email is received 
>>> and/or opened.  It is the responsibility of the recipient to scan the email 
>>> and no responsibility is accepted for any loss or damage arising in any way 
>>> from receipt or use of this email.  Sporting Index Ltd is a company 
>>> registered in England and Wales with company number 2636842, whose 
>>> registered office is at Brookfield House, Green Lane, Ivinghoe, Leighton 
>>> Buzzard, LU7 9ES.  Sporting Index Ltd is authorised and regulated by the UK 
>>> Financial Services Authority (reg. no. 150404). Any financial promotion 
>>> contained herein has been issued 
>>> and approved by Sporting Index Ltd.
>>> 
>>> Outbound email has been scanned for viruses and SPAM
>> 
>> www.sportingindex.com
>> Inbound Email has been scanned for viruses and SPAM 
>> **********************************************************************
>> This email and any attachments are confidential, protected by copyright and 
>> may be legally privileged.  If you are not the intended recipient, then the 
>> dissemination or copying of this email is prohibited. If you have received 
>> this in error, please notify the sender by replying by email and then delete 
>> the email completely from your system.  Neither Sporting Index nor the 
>> sender accepts responsibility for any virus, or any other defect which might 
>> affect any computer or IT system into which the email is received and/or 
>> opened.  It is the responsibility of the recipient to scan the email and no 
>> responsibility is accepted for any loss or damage arising in any way from 
>> receipt or use of this email.  Sporting Index Ltd is a company registered in 
>> England and Wales with company number 2636842, whose registered office is at 
>> Brookfield House, Green Lane, Ivinghoe, Leighton Buzzard, LU7 9ES.  Sporting 
>> Index Ltd is authorised and regulated by the UK Financial Services Authority 
>> (reg. no. 150404). Any financial promotion contained herein has been issued 
>> and approved by Sporting Index Ltd.
>> 
>> Outbound email has been scanned for viruses and SPAM
> 

Reply via email to