Re: [Nutch-general] Nutch changes 0.9.txt

Paul Liddelow Fri, 06 Apr 2007 03:59:58 -0700

Hi

Thanks for that. I actually missed the part where you need to include
the plugins to Nutch entirely. So I have updated my nutch-site.xml to
include the parse-pdf plugin as well as many others I have discovered
you need to add! :P


Paul

On 4/6/07, rubdabadub <[EMAIL PROTECTED]> wrote:
> Could be ..
>
> 1. parse-pdf plugin is not enabled plugin in nutch-site.xml .. you
> need to enable it..
> 2. The pdf file is over the content limit .. you need to increase the
> content limit value in nutch-site.xml.
> 3. Something else that i don't know..
>
> Regards
>
> On 4/6/07, Paul Liddelow <[EMAIL PROTECTED]> wrote:
> > Hi
> >
> > Does anybody know what this means exactly:
> >
> > 8. NUTCH-338 - Remove the text parser as an option for parsing PDF files
> >     in parse-plugins.xml (Chris A. Mattmann via siren)
> >
> > In my crawl log file it says:
> >
> > Error parsing: 
> > http://www.site.com/quick%20reference%20guide%202/$FILE/Law_v2.4_02122006.pdf:
> > failed(2,200): org.apache.nutch.parse.ParseException: parser not found
> > for contentType=application/pdf
> > url=http://www.site.com/quick%20reference%20guide%202/$FILE/Law_v2.4_02122006.pdf
> >
> > This maybe a stupid question, but does the Nutch crawler only retrieve
> > and index links i.e. URL's and not pdf's? The .pdf isn't in the
> > crawl-urlfilter.txt file either. And I can see it in the
> > parse-plugins.xml file:
> >
> > <mimeType name="application/pdf">
> >                 <plugin id="parse-pdf" />
> >         </mimeType>
> >
> > Thanks
> > Paul
> >
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Nutch changes 0.9.txt

Reply via email to