>At the moment I'm using conv_doc.pl with catdoc, pdftotext and pdfinfo as >external parsers but I would like to extend the number of document types I >can handle. I downloaded doc2html and read the docs. and now I'm confused >(too much choice). Can anyone recommend a parser set that works? My >priorities are Word 2000, PDF, Excel, PowerPoint and Flash (with Flash >very low on my list.
Ciao, Here is what I use: external_parsers: \ application/pdf->text/html ${scripts_dir}/doc2html.pl \ application/rtf->text/html ${scripts_dir}/doc2html.pl \ application/msword->text/html ${scripts_dir}/doc2html.pl where scripts_dir is previously set to be pointing to a directory where I put the doc2html script (and related). I use Then, in doc2html.pl I put: my $CATDOC = '/usr/bin/catdoc'; my $PDF2HTML = '/opt/htdig/web/scripts/pdf2html.pl'; Of course change the locations. The doc2html should be the wrapper script. Hope this helps. Give a look also at the FAQs: http://www.htdig.org/FAQ.html#q4.8 http://www.htdig.org/FAQ.html#q4.9 Ciao -Gabriele -- Gabriele Bartolini - Computer Programmer U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa [EMAIL PROTECTED] | http://www.po-net.prato.it/ The nice thing about Windows is - It does not just crash, it displays a dialog box and lets you press 'OK' first. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html