Robert Isaac wrote: >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On >> Behalf Of Mark Grieveson >> Sent: 04 September 2006 04:36 >> To: [email protected] >> Subject: [htdig] parsers? >> >> Hello. When I used htdig with Debian Sarge, there was an >> option to set up an external parser for pdf and doc files. >> It used xpdf for the pdf files. I don't see such an option >> with the version of htdig that comes with Debian Etch in the >> htdig.conf file (the version being 3.2.0b6-1). >> I do see a parse_doc.pl file in the /usr/share/htdig >> directory; so, I'm thinking that there might be a way to set >> up parsing of files such files. >> >> Anyway, if anyone can enlighten me, I would be grateful. >> >> Mark >> >> -------------------------------------------------------------- >> > Have you tried this setup: > > doc2html.cfg > > doc2html.pl > With this variable (edit path to your pdf2html.pl) > # PDF to HTML conversion script > # Full pathname of Perl script pdf2html.pl > my $PDF2HTML = '/var/www/cgi-bin/pdf2html.pl'; > > > > pdf2html.pl > With this variable (These 2 files from the xpdf package) > my $PDFTOTEXT = "/usr/local/bin/pdftotext"; > my $PDFINFO = "/usr/local/bin/pdfinfo"; > > Then this in htdig.conf (edit path to your doc2html.pl) > > external_parsers: application/pdf->text/html /var/www/cgi-bin/doc2html.pl > > Bob > Volvo Owners Club UK > Thanks for your answer. After messing around, I did find an answer for parsing pdf and Word files (no luck with either WordPerfect, rtf, or OpenOffice.org yet, but that's less of a concern). Adding the following lines to htdig.conf worked...
external_parsers: application/pdf->text/html /usr/share/htdig/parse_doc.pl \ application/msword->text/html /usr/share/htdig/parse_doc.pl I had found, in the examples file, a doc2html.pl file, which I struggled to set up. This is not the file to use for Debian Etch, apparently. It did not work. Mark ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

