Sharing data in a mapper for all values

Arko Provo Mukherjee Mon, 31 Oct 2011 16:45:39 -0700

Hello,

I have a situation where I am reading a big file from HDFS and then
comparing all the data in that file with each input to the mapper.


Now since my mapper is trying to read the entire HDFS file for each of its
input, the amount of data it is having to read and keep in memory is
becoming large (file size * no of inputs to the mapper)

Can we someone avoid this by loading the file once for each mapper such
that the mapper can reuse the loaded file for each of the inputs that it
receives.

If this can be done, then for each mapper, I can just load the file once
and then the mapper can use it for the entire slice of data that it
receives.

Thanks a lot in advance!

Warm regards
Arko

Sharing data in a mapper for all values

Reply via email to