Re: Problem with latest SVN during reduce phase

Dominik Friedrich Wed, 11 Jan 2006 14:36:50 -0800

I got this exception a lot, too. I haven't tested the patch by Andrzejyet but instead I just put the doc.add() lines in the indexer reducefunction in a try-catch block . This way the indexing finishes even witha null value and i can see which documents haven't been indexed in thelog file.

Wouldn't it be a good idea to catch every exceptions that only affectone document in loops like this? At least I don't like it if an indexingprocess dies after a few hours because one document triggers such anexception.


best regards,
Dominik

Byron Miller wrote:

60111 103432 reduce > reduce
060111 103432 Optimizing index.
060111 103433 closing > reduce
060111 103434 closing > reduce
060111 103435 closing > reduce
java.lang.NullPointerException: value cannot be null
        at
org.apache.lucene.document.Field.<init>(Field.java:469)
        at
org.apache.lucene.document.Field.<init>(Field.java:412)
        at
org.apache.lucene.document.Field.UnIndexed(Field.java:195)
        at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
        at
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
        at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
Exception in thread "main" java.io.IOException: Job
failed!
        at
org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
        at
org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
        at
org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
[EMAIL PROTECTED]:/data/nutch/trunk$


Pulled todays build and got above error. No problems
running out of disk space or anything like that. This
is a single instance, local file systems.

Anyway to recover the crawl/finish the reduce job from
where it failed?

Re: Problem with latest SVN during reduce phase

Reply via email to