Yes, you can read the file in the configure() (old api) and setup() (new api) methods. The data can be saved in a variable that will be accessible to every call to map().
-Joey On Mon, Oct 31, 2011 at 7:45 PM, Arko Provo Mukherjee <arkoprovomukher...@gmail.com> wrote: > Hello, > I have a situation where I am reading a big file from HDFS and then > comparing all the data in that file with each input to the mapper. > Now since my mapper is trying to read the entire HDFS file for each of its > input, the amount of data it is having to read and keep in memory is > becoming large (file size * no of inputs to the mapper) > Can we someone avoid this by loading the file once for each mapper such that > the mapper can reuse the loaded file for each of the inputs that it > receives. > If this can be done, then for each mapper, I can just load the file once and > then the mapper can use it for the entire slice of data that it receives. > Thanks a lot in advance! > > Warm regards > Arko -- Joseph Echeverria Cloudera, Inc. 443.305.9434