Hi All,
I'm trying to index a set of PDF files using htdig. I've successfully
indexed other PDF files using the same installation, but we now have a new
person doing our PDF's, and they don't seem to be working. We are using
acroread for parsing.
If I execute:
rundig -vvv
I see a message like the following for each of the PDF files:
Header line: HTTP/1.1 200 OK
Header line: Date: Fri, 14 Jul 2000 04:00:36 GMT
Header line: Server: Apache/1.3.12 (Unix)
Header line: Last-Modified: Fri, 02 Jun 2000 04:38:16 GMT
Translated Fri, 02 Jun 2000 04:38:16 GMT to 02 Jun 2000 04:38:16 (100)
And converted to Fri, 02 Jun 2000 04:38:16
Header line: ETag: "129d1-4f24-39373a38"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 20260
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 3876 from document
Read a total of 20260 bytes
PDF::setContents(20260 bytes)
PDF::parse(http://tango.uac.edu.au/htdig/course/mq/300114.pdf)
But, later on, I see the following:
Deleted, no
excerpt: 109/http://tango.uac.edu.au/htdig/course/mq/i/300114.pdf
None of my files are actually being indexed. Does anyone have any
suggestions?
- The PDF's are not excluded in robots.txt
- The server_max_docs parameter is not in use
Cheers,
Paul
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.