I assume Im missing a configuration option here. I think pdftotext is
trying to take the pdf files htdig finds and try to do a http connection
to get them. I think htdig downloads the files somewhere say /tmp/htdig
and the script should read from there, im probably wrong. Any hints.
Ron
eader line: HTTP/1.1
200 OK
Header line: Server: Netscape-Enterprise/3.5.1
Header line: Date: Thu, 18 Jan 2001 23:14:05 GMT
Header line: Content-type: application/pdf
Header line: Link:
<http://hera.kycc.cypress.com/kycc/Classes/EE564_S01/class_01.pdf?PageServices>;
rel="PageServices"
Header line: Last-modified: Wed, 10 Jan 2001 22:19:54 GMT
Translated Wed, 10 Jan 2001 22:19:54 GMT to 2001-01-10 22:19:54 (101)
And converted to Wed, 10 Jan 2001 22:19:54
Header line: Content-length: 1064641
Header line: Accept-ranges: bytes
Header line:
returnStatus = 0
doc2html: http://www.kycc.cypress.com/kycc/Classes/EE564_S01/class_01.pdf
Error (0): PDF file is damaged - attempting to reconstruct x
ref table...
Error: Top-level pages object is wrong type (null)
Error: Couldn't read page catalog
TEXT PDF (0) Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from docume
_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general