HADOOP_HEAPSIZE specifies the memory to be used by the hadoop demons and
does NOT affect the memory used for the map/ reduce jobs. Maybe you should
invest a bit of time reading about Hadoop first?

As for your memory problem it could be due to the parsing and not the
fetching. If you don't already do so I suggest that you separate the
fetching from the parsing. First that will tell you which part fails + if it
does fail in the parsing then you would not need to refetch the content

J.

2009/12/5 MilleBii <[email protected]>

> My fetch cycle failed on the following initial error :
>
> java.io.IOException: Task process exit with nonzero status of 65.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
>
> Than it makes a second attempt and after 3 hours I bump on that error
> (altough I had double HADOOP_HEAPSIZE):
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> Any idea what the initial error is or could be ?
> For the second one, I'm going to reduce number of threads... but I'm
> wondering if there could be a memory leak ? And I don't how to trace that.
>
> --
> -MilleBii-
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to