Mile, Thanks.

"If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a
mapper." <--- this is the case I am talking about.

Haijun


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Miles
Osborne
Sent: Wednesday, June 04, 2008 3:07 PM
To: core-user@hadoop.apache.org
Subject: Re: compressed/encrypted file

You can compress / decompress at many points:

--prior to mapping

--after mapping

--after reducing

(I've been experimenting with all these options; we have been crawling
blogs
every day since Feb and we store on DFS compressed sets of posts)

If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a
mapper.

But otherwise, it is all pretty transparent.

Miles

2008/6/4 Haijun Cao <[EMAIL PROTECTED]>:

>
> If a file is compressed and encrypted, then is it still possible to
split
> it and run mappers in parallel?
>
> Do people compress their files stored in hadoop? If yes, how do you go
> about processing them in parallel?
>
> Thanks
> Haijun
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland,
with registration number SC005336.

Reply via email to