Just trying indexing a smaller segment 300k URLs ... and the memory is just going up and up... but it does NOT hit the physical boundary limit. Sounds like a "memory leak" ??? How come I thought Java was doing the garbage collection automatically ????
2009/7/16 MilleBii <[email protected]> > I get more details now for my error. > What can I do about it, I have 4GB of memory, but it is not fully used (I > think). > I use cygwin/windows/local filesystem > > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > > ---------- Forwarded message ---------- > From: MilleBii <[email protected]> > Date: 2009/7/15 > Subject: Errorr when using language-identifier plugin ? > To: [email protected] > > > I decided to add the language-identifier plugin... but I get the following > error when I start indexing my crawldb. Not really explicit. > If I remove it works just fine. I tried on a smaller crawl database that I > use for testing and it works fine too. > Any idea where to look for ? > > > 2009-07-15 16:19:54,875 WARN mapred.LocalJobRunner - job_local_0001 > 2009-07-15 16:19:54,891 FATAL indexer.Indexer - Indexer: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) > at org.apache.nutch.indexer.Indexer.run(Indexer.java:92) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.Indexer.main(Indexer.java:101) > > > > -- > -MilleBii- > > > > -- > -MilleBii- > -- -MilleBii-
