Hi, I have a situation where I have to read a large file into every mapper.
Since its a large HDFS file that is needed to work on each input to the mapper, it is taking a lot of time to read the data into the memory from HDFS. Thus the system is killing all my Mappers with the following message: 11/10/26 22:54:52 INFO mapred.JobClient: Task Id : attempt_201106271322_12504_m_000000_0, Status : FAILED Task attempt_201106271322_12504_m_000000_0 failed to report status for 601 seconds. Killing! The cluster is not entirely owned by me and hence I cannot change the * mapred.task.timeout* so as to be able to read the entire file. Any suggestions? Also, is there a way such that a Mapper instance reads the file once for all the inputs that it receives. Currently, since the file reading code is in the map method, I guess its reading the entire file for each and every input leading to a lot of overhead. Please help! Many thanks in advance!! Warm regards Arko