Robert Isaac wrote:
>> -----Original Message-----
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On 
>> Behalf Of Mark Grieveson
>> Sent: 04 September 2006 04:36
>> To: [email protected]
>> Subject: [htdig] parsers?
>>
>> Hello.  When I used htdig with Debian Sarge, there was an 
>> option to set up an external parser for pdf and doc files.  
>> It used xpdf for the pdf files.  I don't see such an option 
>> with the version of htdig that comes with Debian Etch in the 
>> htdig.conf file (the version being 3.2.0b6-1).  
>> I do see a parse_doc.pl file in the /usr/share/htdig 
>> directory; so, I'm thinking that there might be a way to set 
>> up parsing of files such files.
>>
>> Anyway, if anyone can enlighten me, I would be grateful.
>>
>> Mark
>>
>> --------------------------------------------------------------
>>     
> Have you tried this setup:
>
> doc2html.cfg
>
> doc2html.pl
> With this variable (edit path to your pdf2html.pl)
> # PDF to HTML conversion script
> # Full pathname of Perl script pdf2html.pl
> my $PDF2HTML = '/var/www/cgi-bin/pdf2html.pl';
>
>
>
> pdf2html.pl
> With this variable (These 2 files from the xpdf package)
> my $PDFTOTEXT = "/usr/local/bin/pdftotext";
> my $PDFINFO = "/usr/local/bin/pdfinfo";
>
> Then this in htdig.conf (edit path to your doc2html.pl)
>
> external_parsers: application/pdf->text/html /var/www/cgi-bin/doc2html.pl
>
> Bob
> Volvo Owners Club UK
>   
Thanks for your answer.  After messing around, I did find an answer for 
parsing pdf and Word files (no luck with either WordPerfect, rtf, or 
OpenOffice.org yet, but that's less of a concern).
 
Adding the following lines to htdig.conf worked...

external_parsers: application/pdf->text/html /usr/share/htdig/parse_doc.pl \
application/msword->text/html /usr/share/htdig/parse_doc.pl


I had found, in the examples file, a doc2html.pl file, which I struggled 
to set up.  This is not the file to use for Debian Etch, apparently.  It 
did not work.

Mark

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to