Re: IOException in dedup

Nic M Tue, 02 Jun 2009 10:14:42 -0700


On Jun 2, 2009, at 12:41 PM, Ken Krugler wrote:

Hello,
I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse forMac OS X. When I try to start crawling I get the following exception:
Dedup: starting
Dedup: adding indexes in: crawl/indexes
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)atorg.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)


Does anyone know how to solve this problem?
You can get an IOException reported by Hadoop when the root cause isthat you've run out of memory. Normally the hadoop.log file wouldhave the OOM exception.
If you're running from inside of Eclipse, see http://wiki.apache.org/nutch/RunNutchInEclipse0.9for more details.
-- Ken
--
Ken Krugler
+1 530-210-6378

Thank you for the pointers Ken. I changed the VM memory parameters asshown at http://wiki.apache.org/nutch/RunNutchInEclipse0.9. However, Istill get the exception and in Hadoop log I have the following exception


2009-06-02 13:08:18,790 INFO  indexer.DeleteDuplicates - Dedup: starting

2009-06-02 13:08:18,817 INFO indexer.DeleteDuplicates - Dedup: addingindexes in: crawl/indexes

2009-06-02 13:08:19,064 WARN  mapred.LocalJobRunner - job_7izmuc
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)

at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176)

        at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)

I am running Lucene 2.1.0. Any idea why I am getting theArrayIndexOutofBoundsEception?

Nic

Re: IOException in dedup

Reply via email to