On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:
Hello,
I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse for
Mac OS X. When I try to start crawling I get the following exception:
Dedup: starting
Dedup: adding indexes in: crawl/indexes
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
604)
at
org
.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:
439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
Does anyone know how to solve this problem?
You can get an IOException reported by Hadoop when the root cause is
that you've run out of memory. Normally the hadoop.log file would
have the OOM exception.
If you're running from inside of Eclipse, see http://wiki.apache.org/nutch/RunNutchInEclipse0.9
for more details.
-- Ken
--
Ken Krugler
+1 530-210-6378
Thank you for the pointers Ken. I changed the VM memory parameters as
shown at http://wiki.apache.org/nutch/RunNutchInEclipse0.9. However, I
still get the exception and in Hadoop log I have the following exception
2009-06-02 13:08:18,790 INFO indexer.DeleteDuplicates - Dedup: starting
2009-06-02 13:08:18,817 INFO indexer.DeleteDuplicates - Dedup: adding
indexes in: crawl/indexes
2009-06-02 13:08:19,064 WARN mapred.LocalJobRunner - job_7izmuc
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
at org.apache.nutch.indexer.DeleteDuplicates$InputFormat
$DDRecordReader.next(DeleteDuplicates.java:176)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.LocalJobRunner
$Job.run(LocalJobRunner.java:126)
I am running Lucene 2.1.0. Any idea why I am getting the
ArrayIndexOutofBoundsEception?
Nic