Or is there any way that we can compress the files while being copied to Hadoop (either through hadoop fs -copyFromLocal or through HIVE's load data local inpath '.....')
________________________________ From: Tim Broberg <tim.brob...@exar.com> To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>; Shantian Purkad <shantian_pur...@yahoo.com> Sent: Thursday, October 27, 2011 4:32 PM Subject: RE: Loading and reading Snappy compressed files on Hadoop Shantian, reading a single compressed file through multiple hosts requires a "splittable" compression format such as bzip or lzo or else encapsulation of the compressed stream(s) in a sequence or avro file. The issue is that, in general, the compressed data is a continuous stream that refers to preceding data and cannot be arbitrarily split apart and taken out of context. Snappy can be encapsulated in avro, etc, but isn't in itself splittable. "Hadoop: The Definitive Guide" has a section on this for more details and definitive information. - Tim. ________________________________ From: Shantian Purkad [shantian_pur...@yahoo.com] Sent: Thursday, October 27, 2011 4:07 PM To: hdfs-user@hadoop.apache.org Subject: Loading and reading Snappy compressed files on Hadoop Hi, What is best way to load a snappy compressed files on Hadoop and then read them through M/R I see that hadoop can uncompress .gz files (using single map) But looking for a better way to utilize multiple mappers to read precompressed files. Thanks and Regards, Shantian ________________________________ The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.