Hello, I have a situation where I am reading a big file from HDFS and then comparing all the data in that file with each input to the mapper.
Now since my mapper is trying to read the entire HDFS file for each of its input, the amount of data it is having to read and keep in memory is becoming large (file size * no of inputs to the mapper) Can we someone avoid this by loading the file once for each mapper such that the mapper can reuse the loaded file for each of the inputs that it receives. If this can be done, then for each mapper, I can just load the file once and then the mapper can use it for the entire slice of data that it receives. Thanks a lot in advance! Warm regards Arko