According to shams khan: > I've used conv_doc.pl (with XPDF) to index PDF documents. I am now > trying to index MS Word documents, but am having problems. > > I've copied the conv_doc.pl script into /usr/local/bin, which contains > the line: > > $CATDOC = "/usr/local/bin/catdoc"; > > I've installed the CATDOC package (which has placed the catdoc binary in > /usr/local/bin and /usr/local/lib) > > I've placed the follwing line within the htdig.conf file: > > application/msword->text/html /usr/local/bin/conv_doc.pl > > But when I try and re-index my website (this time, with the hope of > indexing word documents too), i get the following error message which > apeears next to the word documents: > > test.doc: can't determine type of file /var/www/html/htdig/dv/htdex.8KvYOL; >content-type: application/msword; URL: >http://10.5.1.35/sme/micro/management_self_assessment_guide/test/doc size = 11264
I suggest you try doc2html.pl instead of conv_doc.pl. conv_doc allows only one "magic number" for recgnizing Word documents, whereas I think doc2html allows a few different ones. Not all Word documents have the same identifying byte sequence at the start. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.net email is sponsored by: ApacheCon, November 18-21 in Las Vegas (supported by COMDEX), the only Apache event to be fully supported by the ASF. http://www.apachecon.com _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

