Deleted, no excerpt: 50/http://www.domain.com/download/xxxx.pdf
It appears that the files are being read:
pick: www.domain.com, # servers = 1
50:50:2:http://www.domain.com/download/xxxx.pdf: Retrieval command for http://www.domain.com/download/xxxx.pdf: GET /download/xxxx.pdf HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Referer: http://www.domain.com/download/pdf.html
Host: www.domain.com
Header line: HTTP/1.1 200 OK
Header line: Date: Thu, 15 Apr 2004 20:17:47 GMT
Header line: Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) mod_ssl/2.8.12 OpenSSL/0.9.6b DAV/1.0.3 PHP/4.1.2
Header line: Last-Modified: Fri, 19 Dec 2003 16:51:20 GMT
Converted Fri, 19 Dec 2003 16:51:20 GMT to Fri, 19 Dec 2003 16:51:20
Header line: ETag: "136619f-c76f-3fe32c88"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 51055
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 1903 from document
Read a total of 51055 bytes
size = 51055
I've confirmed that pdf2html.pl and pdftotext both work from the command line.
doc2html.pl just spits out garbage in between the html tags when I try to convert a pdf on the command line with it.
I have the following line in htdig.conf:
external_parsers: application/pdf->text/html /path/to/convertor/htdig/scripts/doc2html.pl
I've also tried to call pdf2html.pl directly in the conf file to no avail.
Any ideas??? Am I missing some config. somewhere? I'm not getting any errors in the doc2html log file so I dont know where to look...
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

