I got this exception a lot, too. I haven't tested the patch by Andrzej yet but instead I just put the doc.add() lines in the indexer reduce function in a try-catch block . This way the indexing finishes even with a null value and i can see which documents haven't been indexed in the log file.

Wouldn't it be a good idea to catch every exceptions that only affect one document in loops like this? At least I don't like it if an indexing process dies after a few hours because one document triggers such an exception.

best regards,
Dominik

Byron Miller wrote:
60111 103432 reduce > reduce
060111 103432 Optimizing index.
060111 103433 closing > reduce
060111 103434 closing > reduce
060111 103435 closing > reduce
java.lang.NullPointerException: value cannot be null
        at
org.apache.lucene.document.Field.<init>(Field.java:469)
        at
org.apache.lucene.document.Field.<init>(Field.java:412)
        at
org.apache.lucene.document.Field.UnIndexed(Field.java:195)
        at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
        at
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
        at
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
Exception in thread "main" java.io.IOException: Job
failed!
        at
org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
        at
org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
        at
org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
[EMAIL PROTECTED]:/data/nutch/trunk$


Pulled todays build and got above error. No problems
running out of disk space or anything like that. This
is a single instance, local file systems.

Anyway to recover the crawl/finish the reduce job from
where it failed?




Reply via email to