On 23 Oct 2002 at 20:53, Tom Sawyer wrote:
> i'm trying to get ht://dig configured and working. but for the life of > me i can't get it to index my pdf and djvu documents. > > i'm running debian woody so i thought the default configuration would > work for at least the pdfs. here's the relevent parts of my config > file: > > > max_doc_size: 9999999 > > external_parsers: application/msword /usr/share/htdig/parse_doc.pl \ > application/postscript /usr/share/htdig/parse_doc.pl > \ application/pdf /usr/share/htdig/parse_doc.pl \ > application/djvu->text/plain /usr/local/bin/djvutxt > > debian_pdf_parser: xpdf > I am not sure what parse_doc.pl is but in my htdig.conf I have the following for pdf: external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl The script "doc2html.pl" lists the programs used to convert various doc types (pdf, word, excel, etc.) to text or HTML. Within "doc2html.pl" I have for PDF conversion: -------------- # PDF to HTML conversion script # Full pathname of Perl script pdf2html.pl my $PDF2HTML = '/usr/local/bin/pdf2html.pl'; -------------- The script "/usr/local/bin/pdf2html.pl" contains the follwoign declarations: -------------- ####--- Configuration ---#### # Full paths of pdtotext and pdfinfo # (get them from the xpdf package at http://www.foolabs.com/xpdf/): #### YOU MUST SET THESE #### my $PDFTOTEXT = "/usr/local/bin/pdftotext"; my $PDFINFO = "/usr/local/bin/pdfinfo"; -------------- pdftotext and pdfinfo were installed when i installed xpdf. > WHEN I RUN: > > rundig -i -v > > I GET THIS: > > New server: localhost, 80 > 3:3:1:http://localhost/files/?S=A: **+*-**** size = 1340 > 4:4:1:http://localhost/files/?D=A: ***+-**** size = 1340 > 5:5:1:http://localhost/files/test2.djvu: not HTML > 6:6:1:http://localhost/files/text1.djvu: not HTML > 7:7:1:http://localhost/files/tty.pdf: not found > 8:8:1:http://localhost/files/word.rhtml: size = 796 > 9:9:2:http://localhost/files/?N=A: ****-**** size = 1340 > 10:10:2:http://localhost/files/?M=D: ****-**** size = 1340 > > Deleted, no excerpt: 5/http://localhost/files/test2.djvu > Deleted, no excerpt: 6/http://localhost/files/text1.djvu > Deleted, no excerpt: 7/http://localhost/files/tty.pdf > htmerge: 10 > > WHAT AM I DOING WRONG? IS THERE SOMETHING I HAVE TO DO TO GET MY > CONFIG FILE TO REGISTER EACH TIME I CHANGE IT? PLEASE HELP. THANKS. > perhaps the pdf and djvu files are not being converted? perhaps you should change the line: debian_pdf_parser: xpdf to debian_pdf_parser: pdf2html.pl? We use debian potato. I do not have the "debian_pdf_parser" declaration at all. I do not know all of the nitty gritty of htdig but hope this may have helped in some way. cheers, adrian ------------------------------------------------------- This sf.net email is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0002en _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

