Re: tar or hadoop archive

Joey Echeverria Mon, 27 Jun 2011 16:47:08 -0700

Yes, you can see a picture describing HAR files in this old blog post:

http://www.cloudera.com/blog/2009/02/the-small-files-problem/


-Joey

On Mon, Jun 27, 2011 at 4:36 PM, Rita <rmorgan...@gmail.com> wrote:
> So, it does an index of the file?
>
>
>
> On Mon, Jun 27, 2011 at 10:10 AM, Joey Echeverria <j...@cloudera.com> wrote:
>
>> The advantage of a hadoop archive files is it lets you access the
>> files stored in it directly. For example, if you archived three files
>> (a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
>> of the three files using the hadoop command line:
>>
>> hadoop fs -cat har:///user/joey/out/foo.har/a.txt
>>
>> You can also copy files out of the archive or use files in the archive
>> as input to map reduce jobs.
>>
>> -Joey
>>
>> On Mon, Jun 27, 2011 at 3:06 AM, Rita <rmorgan...@gmail.com> wrote:
>> > We use hadoop/hdfs to archive data. I archive a lot of file by creating
>> one
>> > large tar file and then placing to hdfs. Is it better to use hadoop
>> archive
>> > for this or is it essentially the same thing?
>> >
>> > --
>> > --- Get your facts first, then you can distort them as you please.--
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: tar or hadoop archive

Reply via email to