According to shams khan:
> I've used conv_doc.pl (with XPDF) to index PDF documents.  I am now 
> trying to index MS Word documents, but am having problems.
> 
> I've copied the conv_doc.pl script into /usr/local/bin, which contains 
> the line:
> 
> $CATDOC = "/usr/local/bin/catdoc";
> 
> I've installed the CATDOC package (which has placed the catdoc binary in 
> /usr/local/bin and /usr/local/lib)
> 
> I've placed the follwing line within the htdig.conf file:
> 
> application/msword->text/html /usr/local/bin/conv_doc.pl
> 
> But when I try and re-index my website (this time, with the hope of 
> indexing word documents too), i get the following error message which 
> apeears next to the word documents:
> 
> test.doc: can't determine type of file /var/www/html/htdig/dv/htdex.8KvYOL; 
>content-type: application/msword; URL: 
>http://10.5.1.35/sme/micro/management_self_assessment_guide/test/doc size = 11264

I suggest you try doc2html.pl instead of conv_doc.pl.  conv_doc allows
only one "magic number" for recgnizing Word documents, whereas I think
doc2html allows a few different ones.  Not all Word documents have the
same identifying byte sequence at the start.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: ApacheCon, November 18-21 in
Las Vegas (supported by COMDEX), the only Apache event to be
fully supported by the ASF. http://www.apachecon.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to