>At the moment I'm using conv_doc.pl with catdoc, pdftotext and pdfinfo as 
>external parsers but I would like to extend the number of document types I 
>can handle. I downloaded doc2html and read the docs. and now I'm confused 
>(too much choice). Can anyone recommend a parser set that works? My 
>priorities are Word 2000, PDF, Excel, PowerPoint and Flash (with Flash 
>very low on my list.

Ciao,

Here is what I use:

external_parsers: \
    application/pdf->text/html ${scripts_dir}/doc2html.pl \
    application/rtf->text/html ${scripts_dir}/doc2html.pl \
    application/msword->text/html ${scripts_dir}/doc2html.pl

where scripts_dir is previously set to be pointing to a directory where I 
put the doc2html script (and related). I use

Then, in doc2html.pl I put:

my $CATDOC = '/usr/bin/catdoc';
my $PDF2HTML = '/opt/htdig/web/scripts/pdf2html.pl';

Of course change the locations. The doc2html should be the wrapper script.

Hope this helps. Give a look also at the FAQs:

http://www.htdig.org/FAQ.html#q4.8
http://www.htdig.org/FAQ.html#q4.9

Ciao
-Gabriele

--
Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
[EMAIL PROTECTED] | http://www.po-net.prato.it/

  The nice thing about Windows is - It does not just crash,
  it displays a dialog box and lets you press 'OK' first.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to