Hi,

Probably your map method takes too long to process the data. You could add
some context.progress() or context.setStatus("status") in your map method
from time to time (at least once every 600 seconds, to not get the timeout).

Regards,
Lucian

On Thu, Oct 27, 2011 at 11:22 AM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> Hi,
>
> I have a situation where I have to read a large file into every mapper.
>
> Since its a large HDFS file that is needed to work on each input to the
> mapper, it is taking a lot of time to read the data into the memory from
> HDFS.
>
> Thus the system is killing all my Mappers with the following message:
>
> 11/10/26 22:54:52 INFO mapred.JobClient: Task Id :
> attempt_201106271322_12504_m_000000_0, Status : FAILED
> Task attempt_201106271322_12504_m_000000_0 failed to report status for 601
> seconds. Killing!
>
> The cluster is not entirely owned by me and hence I cannot change the *
> mapred.task.timeout* so as to be able to read the entire file.
>
> Any suggestions?
>
> Also, is there a way such that a Mapper instance reads the file once for
> all the inputs that it receives.
> Currently, since the file reading code is in the map method, I guess its
> reading the entire file for each and every input leading to a lot of
> overhead.
>
> Please help!
>
> Many thanks in advance!!
>
> Warm regards
> Arko
>



-- 
Numai bine,
Lucian

Reply via email to