I am facing this error as well. Now I located one particular document
which is causing it (it is msword document which can't be properly
parsed by parser). I have sent it to Andrzej in separed email. Let's
see if that helps...

On 1/11/06, Dominik Friedrich <[EMAIL PROTECTED]> wrote:
> I got this exception a lot, too. I haven't tested the patch by Andrzej
> yet but instead I just put the doc.add() lines in the indexer reduce
> function in a try-catch block . This way the indexing finishes even with
> a null value and i can see which documents haven't been indexed in the
> log file.
> Wouldn't it be a good idea to catch every exceptions that only affect
> one document in loops like this? At least I don't like it if an indexing
> process dies after a few hours because one document triggers such an
> exception.
> best regards,
> Dominik
> Byron Miller wrote:
> > 60111 103432 reduce > reduce
> > 060111 103432 Optimizing index.
> > 060111 103433 closing > reduce
> > 060111 103434 closing > reduce
> > 060111 103435 closing > reduce
> > java.lang.NullPointerException: value cannot be null
> >         at
> > org.apache.lucene.document.Field.<init>(Field.java:469)
> >         at
> > org.apache.lucene.document.Field.<init>(Field.java:412)
> >         at
> > org.apache.lucene.document.Field.UnIndexed(Field.java:195)
> >         at
> > org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
> >         at
> > org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> >         at
> > org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
> > Exception in thread "main" java.io.IOException: Job
> > failed!
> >         at
> > org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
> >         at
> > org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
> >         at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
> > [EMAIL PROTECTED]:/data/nutch/trunk$
> >
> >
> > Pulled todays build and got above error. No problems
> > running out of disk space or anything like that. This
> > is a single instance, local file systems.
> >
> > Anyway to recover the crawl/finish the reduce job from
> > where it failed?
> >
> >
> >

Reply via email to