Re: Loading and reading Snappy compressed files on Hadoop

Shantian Purkad Thu, 27 Oct 2011 18:00:24 -0700

Or is there any way that we can compress the files while being copied to Hadoop 
(either through hadoop fs -copyFromLocal or through HIVE's load data local 
inpath '.....')

________________________________
From: Tim Broberg <tim.brob...@exar.com>
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>; Shantian 
Purkad <shantian_pur...@yahoo.com>
Sent: Thursday, October 27, 2011 4:32 PM
Subject: RE: Loading and reading Snappy compressed files on Hadoop

Shantian, reading a single compressed file through multiple hosts requires a 
"splittable" compression format such as bzip or lzo or else encapsulation of 
the compressed stream(s) in a sequence or avro file.

The issue is that, in general, the compressed data is a continuous stream that 
refers to preceding data and cannot be arbitrarily split apart and taken out of 
context.

Snappy can be encapsulated in avro, etc, but isn't in itself splittable.

"Hadoop: The Definitive Guide" has a section on this for more details and 
definitive information.

    - Tim.

________________________________
 From: Shantian Purkad [shantian_pur...@yahoo.com]
Sent: Thursday, October 27, 2011 4:07 PM
To: hdfs-user@hadoop.apache.org
Subject: Loading and reading Snappy compressed files on Hadoop

Hi,

What is best way to load a snappy compressed files on Hadoop and then read them 
through M/R

I see that hadoop can uncompress .gz files (using single map)

But looking for a better way to utilize multiple mappers to read precompressed 
files.

Thanks and Regards,
Shantian

________________________________
 The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

Re: Loading and reading Snappy compressed files on Hadoop

Reply via email to