On Fri, Jul 17, 2009 at 00:30, MilleBii<[email protected]> wrote:
> Just trying indexing a smaller segment 300k URLs ... and the memory is just
> going up and up... but it does NOT hit the physical boundary limit. Sounds
> like a "memory leak" ???
> How come I thought Java was doing the garbage collection automatically ????
>

Can you try the patch at

https://issues.apache.org/jira/browse/NUTCH-356

(try cache_classes.patch)

>
> 2009/7/16 MilleBii <[email protected]>
>
>> I get more details now for my error.
>> What can I do about it, I have 4GB of memory, but it is not fully used (I
>> think).
>> I use cygwin/windows/local filesystem
>>
>> java.lang.OutOfMemoryError: Java heap space
>>     at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>     at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>>
>> ---------- Forwarded message ----------
>> From: MilleBii <[email protected]>
>> Date: 2009/7/15
>> Subject: Errorr when using language-identifier plugin ?
>> To: [email protected]
>>
>>
>> I decided to add the language-identifier plugin... but I get the following
>> error when I start indexing my crawldb. Not really explicit.
>>  If I remove it works just fine. I tried on a smaller crawl database that I
>> use for testing and it works fine too.
>> Any idea where to look for ?
>>
>>
>> 2009-07-15 16:19:54,875 WARN  mapred.LocalJobRunner - job_local_0001
>> 2009-07-15 16:19:54,891 FATAL indexer.Indexer - Indexer:
>> java.io.IOException: Job failed!
>>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>>     at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
>>     at org.apache.nutch.indexer.Indexer.run(Indexer.java:92)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.nutch.indexer.Indexer.main(Indexer.java:101)
>>
>>
>>
>> --
>> -MilleBii-
>>
>>
>>
>> --
>> -MilleBii-
>>
>
>
>
> --
> -MilleBii-
>



-- 
Doğacan Güney

Reply via email to