Hi,
I think this issue can be more complex. If I remember my test
correctly then parse object was not null. Also parse.getText() was not
null (it just contained empty String).
If document is not parsed correctly then "empty" parse is returned
instead: parseStatus.getEmptyParse(); which should be OK, but I didn't
have a chance to check if this can cause any troubles during index
index optimization.
Lukas

On 1/12/06, Pashabhai <[EMAIL PROTECTED]> wrote:
> Hi ,
>
>    The very similar exception occurs while indexing a
> page which do not have body content (and title
> sometimes).
>
> 051223 194717 Optimizing index.
> java.lang.NullPointerException
>         at
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:75)
>
>         at
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:63)
>
>         at
> org.apache.nutch.crawl.Indexer.reduce(Indexer.java:217)
>
>         at
> org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
>
>         at
>
>
>  Looking into the source code of BasicIndexingFilter.
> it is trying to
> doc.add(Field.UnStored("content", parse.getText()));
>
> I guess adding check for null on parse object
> if(parse!=null)   should solve the problem.
>
> Can confirm when tested locally.
>
> Thanks
> P
>
>
>
>
> --- Lukas Vlcek <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I am facing this error as well. Now I located one
> > particular document
> > which is causing it (it is msword document which
> > can't be properly
> > parsed by parser). I have sent it to Andrzej in
> > separed email. Let's
> > see if that helps...
> > Lukas
> >
> > On 1/11/06, Dominik Friedrich
> > <[EMAIL PROTECTED]> wrote:
> > > I got this exception a lot, too. I haven't tested
> > the patch by Andrzej
> > > yet but instead I just put the doc.add() lines in
> > the indexer reduce
> > > function in a try-catch block . This way the
> > indexing finishes even with
> > > a null value and i can see which documents haven't
> > been indexed in the
> > > log file.
> > >
> > > Wouldn't it be a good idea to catch every
> > exceptions that only affect
> > > one document in loops like this? At least I don't
> > like it if an indexing
> > > process dies after a few hours because one
> > document triggers such an
> > > exception.
> > >
> > > best regards,
> > > Dominik
> > >
> > > Byron Miller wrote:
> > > > 60111 103432 reduce > reduce
> > > > 060111 103432 Optimizing index.
> > > > 060111 103433 closing > reduce
> > > > 060111 103434 closing > reduce
> > > > 060111 103435 closing > reduce
> > > > java.lang.NullPointerException: value cannot be
> > null
> > > >         at
> > > >
> >
> org.apache.lucene.document.Field.<init>(Field.java:469)
> > > >         at
> > > >
> >
> org.apache.lucene.document.Field.<init>(Field.java:412)
> > > >         at
> > > >
> >
> org.apache.lucene.document.Field.UnIndexed(Field.java:195)
> > > >         at
> > > >
> >
> org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
> > > >         at
> > > >
> >
> org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
> > > >         at
> > > >
> >
> org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
> > > > Exception in thread "main" java.io.IOException:
> > Job
> > > > failed!
> > > >         at
> > > >
> >
> org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
> > > >         at
> > > >
> >
> org.apache.nutch.indexer.Indexer.index(Indexer.java:259)
> > > >         at
> > > >
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
> > > > [EMAIL PROTECTED]:/data/nutch/trunk$
> > > >
> > > >
> > > > Pulled todays build and got above error. No
> > problems
> > > > running out of disk space or anything like that.
> > This
> > > > is a single instance, local file systems.
> > > >
> > > > Anyway to recover the crawl/finish the reduce
> > job from
> > > > where it failed?
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

Reply via email to