Hi, I am facing this error as well. Now I located one particular document which is causing it (it is msword document which can't be properly parsed by parser). I have sent it to Andrzej in separed email. Let's see if that helps... Lukas
On 1/11/06, Dominik Friedrich <[EMAIL PROTECTED]> wrote: > I got this exception a lot, too. I haven't tested the patch by Andrzej > yet but instead I just put the doc.add() lines in the indexer reduce > function in a try-catch block . This way the indexing finishes even with > a null value and i can see which documents haven't been indexed in the > log file. > > Wouldn't it be a good idea to catch every exceptions that only affect > one document in loops like this? At least I don't like it if an indexing > process dies after a few hours because one document triggers such an > exception. > > best regards, > Dominik > > Byron Miller wrote: > > 60111 103432 reduce > reduce > > 060111 103432 Optimizing index. > > 060111 103433 closing > reduce > > 060111 103434 closing > reduce > > 060111 103435 closing > reduce > > java.lang.NullPointerException: value cannot be null > > at > > org.apache.lucene.document.Field.<init>(Field.java:469) > > at > > org.apache.lucene.document.Field.<init>(Field.java:412) > > at > > org.apache.lucene.document.Field.UnIndexed(Field.java:195) > > at > > org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198) > > at > > org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260) > > at > > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90) > > Exception in thread "main" java.io.IOException: Job > > failed! > > at > > org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) > > at > > org.apache.nutch.indexer.Indexer.index(Indexer.java:259) > > at > > org.apache.nutch.crawl.Crawl.main(Crawl.java:121) > > [EMAIL PROTECTED]:/data/nutch/trunk$ > > > > > > Pulled todays build and got above error. No problems > > running out of disk space or anything like that. This > > is a single instance, local file systems. > > > > Anyway to recover the crawl/finish the reduce job from > > where it failed? > > > > > > > > >