Thx again Julien,

Yes I'm going to buy myself the Hadoop book, because I thought I could do
without but I realize that I need to make good use of hadooop.

Didn't know you could split fetching & parsing:  so I suppose you just issue
nutch fetch <segment> -noParsing, followed by nutch parse <segment>. I will
try on my next run.



2009/12/5 Julien Nioche <[email protected]>

> HADOOP_HEAPSIZE specifies the memory to be used by the hadoop demons and
> does NOT affect the memory used for the map/ reduce jobs. Maybe you should
> invest a bit of time reading about Hadoop first?
>
> As for your memory problem it could be due to the parsing and not the
> fetching. If you don't already do so I suggest that you separate the
> fetching from the parsing. First that will tell you which part fails + if
> it
> does fail in the parsing then you would not need to refetch the content
>
> J.
>
> 2009/12/5 MilleBii <[email protected]>
>
> > My fetch cycle failed on the following initial error :
> >
> > java.io.IOException: Task process exit with nonzero status of 65.
> >        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
> >
> > Than it makes a second attempt and after 3 hours I bump on that error
> > (altough I had double HADOOP_HEAPSIZE):
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Any idea what the initial error is or could be ?
> > For the second one, I'm going to reduce number of threads... but I'm
> > wondering if there could be a memory leak ? And I don't how to trace
> that.
> >
> > --
> > -MilleBii-
> >
>
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>



-- 
-MilleBii-

Reply via email to