On Thu, 8 Jun 2000 09:12:32 -0500 (CDT) Gilles Detillieux 

> According to Andre Reuber:
> > I am beginner in operating with htdig.  Ist there any possibility
> > to make a index on .doc, .pdf, .xls, ... files? Do I need any extra
> > source? Where can I get this source.
> See http://www.htdig.org/FAQ.html#q4.8
> and http://www.htdig.org/FAQ.html#q4.9
> The .xls files may be a bit more of a challenge.  I'd recommend using
> doc2html for .doc & .pdf, and if you find and install the Excel to HTML
> converter, xlHtml, you could probably add it to doc2html as an extra
> converter fairly easily (if you have at least a minor understanding
> of Perl).

I don't think it is quite so simple: doc2html.pl (and 
parse_doc and conv_doc) only use the "magic number" of the 
file to decide which utility to use for conversion.

MS Word and Excel files can have the same magic number.

The easy solution is a separate conversion script for excel 
files.  The sophisticated solution is a more advanced 
script which uses the information on MIME type passed to it.

I hadn't heard of xlHTML and would like to know more.  
As an alternative, there is a simple .xls to .csv 
conversion program available from the same site as catdoc.

David Adams

