Hi, I've been trying for some time, but I can't get this to work... I've read this mailinglist-achives, but... no dice! I hope someone can spot the problem, because I can't...
These are the relevant parts of the configuration files: ----- ***www-zbbrabant-nl.conf: ----- application/pdf->text/html /usr/local/bin/doc2html_31/doc2html.pl \ application/octet-stream->text/html /usr/local/bin/doc2html_31/doc2html.pl ...etc ----- *** doc2html: ----- # PDF to HTML conversion script: my $PDF2HTML = '/usr/local/bin/doc2html_31/pdf2html.pl'; ----- *** pdf2html: ----- # Full paths of pdtotext and pdfinfo # (get them from the xpdf package at http://www.foolabs.com/xpdf/): my $PDFTOTEXT = "/usr/X11R6/bin/pdftotext"; my $PDFINFO = "/usr/X11R6/bin/pdfinfo"; ----- Log if I run htdig by hand, with this command: /usr/local/htdig/bin/htdig -vvvvvvv -s -a -c /opt/www/htdig/conf/www-zbbrabant-nl.conf > logPaul I get this in the log: ----- 757:1251:5:http://zbg-brabant.beethoven/php/bibliotheek/download.php?id=114&bestand=test.pdf: Retrieval command for http:/$ User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Referer: http://zbg-brabant.beethoven/index.php?p=57&s=2&document_id=114 Host: zbg-brabant.beethoven Header line: HTTP/1.1 200 OK Header line: Date: Tue, 28 Aug 2007 14:17:01 GMT Header line: Server: Apache/1.3.33 (Unix) mod_ssl/2.8.22 OpenSSL/0.9.7d PHP/4.4.0 Header line: X-Powered-By: PHP/4.4.0 Header line: Content-Length: 144718 Header line: Content-Disposition: attachment; filename="test.pdf" Header line: Expires: 0 Header line: Cache-Control: must-revalidate, post-check=0, pre-check=0 Header line: Pragma: public Header line: Connection: close Header line: Content-Type: application/pdf Header line: Content-Type: application/pdf Header line: returnStatus = 0 Read 8192 from document .. more like that ... Read 5454 from document Read a total of 144718 bytes PDF::setContents(144718 bytes) PDF::parse(http://zbg-brabant.beethoven/php/bibliotheek/download.php?id=114&bestand=test.pdf) size = 144718 pick: zbg-brabant.beethoven, # servers = 1 ----- This seems to be correct.. but nothing gets indexed! I can't find any words that apear in the PDF... (i made a special PDF with one word, which will not be found anywhere else in the site... Can someone give me some insight? thanks! Paul ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

