Yes, you can read the file in the configure() (old api) and setup()
(new api) methods. The data can be saved in a variable that will be
accessible to every call to map().

-Joey

On Mon, Oct 31, 2011 at 7:45 PM, Arko Provo Mukherjee
<arkoprovomukher...@gmail.com> wrote:
> Hello,
> I have a situation where I am reading a big file from HDFS and then
> comparing all the data in that file with each input to the mapper.
> Now since my mapper is trying to read the entire HDFS file for each of its
> input, the amount of data it is having to read and keep in memory is
> becoming large (file size * no of inputs to the mapper)
> Can we someone avoid this by loading the file once for each mapper such that
> the mapper can reuse the loaded file for each of the inputs that it
> receives.
> If this can be done, then for each mapper, I can just load the file once and
> then the mapper can use it for the entire slice of data that it receives.
> Thanks a lot in advance!
>
> Warm regards
> Arko



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to