Re: try to parse pdf

Andrzej Bialecki Mon, 13 Mar 2006 09:53:24 -0800

Richard Braman wrote:

That error is actually not from the http content limit, but I would
recommend setting the content limit to -1.  For some reason this error

I would recommend against it - you may inadvertently fetchgigabyte-sized files if you skip content limits... but you can set itsufficiently high so that it still makes sense, e.g. 2-10 MB.

sems to happen sometimes even after you add the pdf parsing plug in like
you did.  I think nutch must cache the plug in properties in
nutch-default.  It will start to parse pdfs at some point.

Nutch doesn't cache plugin properties in any place except the currentlyrunning process. All properties are read anew from the config fileswhenever you start any nutch processing.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: try to parse pdf

Reply via email to