Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its 
general health will probably be helpful to people on the list.

Amogh



On 11/2/09 2:02 PM, "Zhang Bingjun (Eddy)" <eddym...@gmail.com> wrote:

Dear hadoop fellows,

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
case, we only have mappers to crawl data and save data into HDFS in a
distributed way. No reducers is specified in the job conf.

The problem is that for every job we have about one third mappers stuck with
100% progress but never complete. If we look at the the tasktracker log of
those mappers, the last log was the key input INFO log line and no others
logs were output after that.

>From the stdout log of a specific attempt of one of those mappers, we can
see that the map function of the mapper has been finished completely and the
control of the execution should be somewhere in the MapReduce framework
part.

Does anyone have any clue about this problem? Is it because we didn't use
any reducers? Since two thirds of the mappers could complete successfully
and commit their output data into HDFS, I suspect the stuck mappers has
something to do with the MapReduce framework code?

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddym...@gmail.com, bing...@nus.edu.sg, bing...@comp.nus.edu.sg
Tel No: +65-96188110 (M)

Reply via email to