[ 
http://issues.apache.org/jira/browse/NUTCH-220?page=comments#action_12372275 ] 

Richard Braman commented on NUTCH-220:
--------------------------------------

PDFBox-0.7.3 no longer depends on log4j at all, so you should not be
getting any log4j errors from PDFBox!

Ben


On Sun, 26 Mar 2006, Richard Braman wrote:

> > Hi Ben,
> > I noticed that the nutch uses a log4j version of PDFBox.jar.  I don't
> > see this as an ant target on 0.7.3 .  I downloaded pdfbox from CVS Head.
> >
> > When I tried to use the PDFBox nightly it gave me a bunch of log4j
> > errors, so I guess nutch expects the log4j version.
> >
> > I am trying to upgrade my nutch to 0.7.3 to see if I can get arid of the
> > NPE error.
> >
> > The NPE bug I told you about a few weeks ago is much worse effect in
> > Nutch .8, as it seems to cause the fetcher to abort.
> >
> > 060326 142450 fetch of
> > http://www.state.sd.us/drr2/reg/bank/Trust%20Fee%20Calculation.pdf
> > failed with: java.lang.NullPointerException
> > java.lang.NullPointerException
> >     at
> > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:180)
> >     at
> > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:171)
> >     at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
> >     at
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:245)
> >     at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:185)
> > 060326 142450 SEVERE fetcher caught:java.lang.NullPointerException
> >
> > --
> > Richard L Braman, Jr., CPA
> > Tax Code Software Foundation, Inc.
> > Open Source Tax Software
> > http://www.taxcodesoftware.org
> > [EMAIL PROTECTED]
> >


> PDF Box can't parse document: java.lang.NullPointerException
> ------------------------------------------------------------
>
>          Key: NUTCH-220
>          URL: http://issues.apache.org/jira/browse/NUTCH-220
>      Project: Nutch
>         Type: Bug
>  Environment: PDFBox 0.7.2
>     Reporter: Richard Braman

>
> This error was fixed in the ltest build of PDFBOx, which should be tested 
> with nutch.
> >> 060228 160354 fetch okay, but can't parse
> >> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
> >> failed(2,0): Can't be handled as pdf document. 
> >> java.lang.NullPointerException
> Yes, the NPE should be fixed.
>  Ben
> Richard Braman wrote:
> > Hi Bn,
> >
> > We actually got to the bottom of all of them except for 1... The 
> > content truncatetion was due to an inconsistancy bug in nutch config .
> > The no permission to extract text is actually true, the author, the NC
> > Department of revenue put this restriction on all of their files (I have
> > asked them to remove it as it hampers public accessability).  The Null
> > pointer exception is the only one to deal with that may be due to the
> > parsing bug .  Is this one that you are referring to?
> >
> > -----Original Message-----
> > From: Ben Litchfield [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, March 02, 2006 4:07 PM
> > To: Richard Braman
> > Cc: [email protected]; [email protected];
> > [EMAIL PROTECTED]
> > Subject: Re: [PDFBox-user] PDF Parse Error
> >
> >
> >
> > I believe these errors are due to a parsing bug in PDFBox that has 
> > been fixed since the 0.7.2 release.  Please give the nightly 
> > build(should be a drop in replacement) a try from 
> > http://www.pdfbox.org/dist and let me know if you are still having 
> > issues.
> >
> > Ben

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to