Hi,

I've been trying for some time, but I can't get this to work...
I've read this mailinglist-achives, but... no dice! I hope someone can
spot the problem, because I can't...

These are the relevant parts of the configuration files:
-----
***www-zbbrabant-nl.conf:
-----
application/pdf->text/html /usr/local/bin/doc2html_31/doc2html.pl \
application/octet-stream->text/html /usr/local/bin/doc2html_31/doc2html.pl
...etc
-----
*** doc2html:
-----
# PDF to HTML conversion script:
my $PDF2HTML = '/usr/local/bin/doc2html_31/pdf2html.pl';

-----
*** pdf2html:
-----
# Full paths of pdtotext and pdfinfo
# (get them from the xpdf package at http://www.foolabs.com/xpdf/):
my $PDFTOTEXT = "/usr/X11R6/bin/pdftotext";
my $PDFINFO = "/usr/X11R6/bin/pdfinfo";

-----

Log if I run htdig by hand, with this command:
/usr/local/htdig/bin/htdig -vvvvvvv -s -a -c
/opt/www/htdig/conf/www-zbbrabant-nl.conf > logPaul

I get this in the log:
-----
757:1251:5:http://zbg-brabant.beethoven/php/bibliotheek/download.php?id=114&bestand=test.pdf:
Retrieval command for http:/$
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Referer: http://zbg-brabant.beethoven/index.php?p=57&s=2&document_id=114
Host: zbg-brabant.beethoven

Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 28 Aug 2007 14:17:01 GMT
Header line: Server: Apache/1.3.33 (Unix) mod_ssl/2.8.22
OpenSSL/0.9.7d PHP/4.4.0
Header line: X-Powered-By: PHP/4.4.0
Header line: Content-Length: 144718
Header line: Content-Disposition: attachment; filename="test.pdf"
Header line: Expires: 0
Header line: Cache-Control: must-revalidate, post-check=0, pre-check=0
Header line: Pragma: public
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
.. more like that ...
Read 5454 from document
Read a total of 144718 bytes
PDF::setContents(144718 bytes)
PDF::parse(http://zbg-brabant.beethoven/php/bibliotheek/download.php?id=114&bestand=test.pdf)
 size = 144718
pick: zbg-brabant.beethoven, # servers = 1
-----

This seems to be correct.. but nothing gets indexed!
I can't find any words that apear in the PDF... (i made a special PDF
with one word, which will not be found anywhere else in the site...

Can someone give me some insight?

thanks!

Paul

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to