On 23 Oct 2002 at 20:53, Tom Sawyer wrote:

> i'm trying to get ht://dig configured and working. but for the life of
> me i can't get it to index my pdf and djvu documents.
> 
> i'm running debian woody so i thought the default configuration would
> work for at least the pdfs. here's the relevent parts of my config
> file:
> 
> 
> max_doc_size:     9999999
> 
> external_parsers: application/msword /usr/share/htdig/parse_doc.pl \
>                   application/postscript /usr/share/htdig/parse_doc.pl
>                   \ application/pdf /usr/share/htdig/parse_doc.pl \
>                   application/djvu->text/plain /usr/local/bin/djvutxt
> 
> debian_pdf_parser: xpdf
> 


I am not sure what parse_doc.pl is but in my htdig.conf I have the 
following for pdf:
external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl

The script "doc2html.pl"  lists the programs used to convert various doc 
types (pdf, word, excel, etc.) to text or HTML.

Within "doc2html.pl" I have for PDF conversion:
--------------
# PDF to HTML conversion script
# Full pathname of Perl script pdf2html.pl
my $PDF2HTML = '/usr/local/bin/pdf2html.pl';
--------------

The script "/usr/local/bin/pdf2html.pl" contains the follwoign declarations:
--------------
####--- Configuration ---####
# Full paths of pdtotext and pdfinfo
# (get them from the xpdf package at http://www.foolabs.com/xpdf/):

#### YOU MUST SET THESE  ####

my $PDFTOTEXT = "/usr/local/bin/pdftotext";
my $PDFINFO = "/usr/local/bin/pdfinfo";
--------------

pdftotext and pdfinfo were installed when i installed xpdf.


> WHEN I RUN:
> 
> rundig -i -v
> 
> I GET THIS:
> 
> New server: localhost, 80
> 3:3:1:http://localhost/files/?S=A: **+*-**** size = 1340
> 4:4:1:http://localhost/files/?D=A: ***+-**** size = 1340
> 5:5:1:http://localhost/files/test2.djvu:  not HTML
> 6:6:1:http://localhost/files/text1.djvu:  not HTML
> 7:7:1:http://localhost/files/tty.pdf:  not found
> 8:8:1:http://localhost/files/word.rhtml:  size = 796
> 9:9:2:http://localhost/files/?N=A: ****-**** size = 1340
> 10:10:2:http://localhost/files/?M=D: ****-**** size = 1340

> 
> Deleted, no excerpt: 5/http://localhost/files/test2.djvu
> Deleted, no excerpt: 6/http://localhost/files/text1.djvu
> Deleted, no excerpt: 7/http://localhost/files/tty.pdf
> htmerge: 10
> 
> WHAT AM I DOING WRONG? IS THERE SOMETHING I HAVE TO DO TO GET MY
> CONFIG FILE TO REGISTER EACH TIME I CHANGE IT? PLEASE HELP. THANKS.
> 


perhaps the pdf and djvu files are not being converted?

perhaps you should change the line:
debian_pdf_parser: xpdf
to
debian_pdf_parser: pdf2html.pl?

We use debian potato.  I do not have the "debian_pdf_parser" 
declaration at all.

I do not know all of the nitty gritty of htdig but hope this may have helped 
in some way.

cheers,

adrian




-------------------------------------------------------
This sf.net email is sponsored by: Influence the future 
of Java(TM) technology. Join the Java Community 
Process(SM) (JCP(SM)) program now. 
http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0002en

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to