In nutch-site.xml
Set it to something like

<property>
<name>http.content.limit</name>
<value>655360</value>
</property>

Jeff.


Richard Braman wrote:

I get the following errors regarding pdf:

060228 160518 fetch okay, but can't parse
http://taxpros.marylandtaxes.com/publications/revenews/archives/spr05_hi
.pdf, reason: failed(2,202): Content truncated at 66005 bytes. Parser
can't handle incomplete pdf file.

060228 160354 fetch okay, but can't parse
http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
failed(2,0): Can't be handled as pdf document.
java.lang.NullPointerException

060228 160518 fetch okay, but can't parse
http://www.dor.state.nc.us/downloads/corp_archive/03archive/NC478_Instru
ctions.pdf, reason: failed(2,0): Can't be handled as pdf document.
java.io.IOException: You do not have permission to extract text

I have a number of errors like this in my log, mostly the content
truncated one.

The thing is these files all open fine in acrobat.



Richard Braman
mailto:[EMAIL PROTECTED]
561.748.4002 (voice) http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> Free Open Source Tax Software






-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to