Forgive me if I'm misjudging the case, but it sounds as though you have NOT read the instructions. If you had, you would know that doc2html does not require the commercial converter wp2html. It can be used with the freeware catdoc, though wp2html should give better results.
You should also know that parse_doc.pl requires catdoc and does not work without it. Have you installed catdoc? 'Fraid I not familiar with htparsedoc, but the same probably applies. -- David Adams Information Systems Services Southampton University ----- Original Message ----- From: "Keith Pettit" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, December 09, 2002 10:38 PM Subject: [htdig] Won't index word doc's > I am so boggled with this. I've followed all the instructions tried > different vresion of htdig, different paresers and nothing seems to work > and I can't tell where the failure is. > > Basically I'm running htdig on a index page I created. All this page has > is a bunch of links to word documents. But it don't search though any > of the doc's. > > This is what I get when I run it: > htdig: Run complete > htdig: 1 server seen: > htdig: www.drgutah.com:80 1 document > > I've tried using htparsedoc, parse_doc.pl, and the doc2html. htparsedoc > and parse_doc.pl work by themselves if just execute them by themselves > and point them at a word file, can't get doc2html to work and I assume > it's becuase I won't buy the commerical coverter. So I'm assumin there > is some sort of issue in my config. I've got it pointing to the right > places it just seems like it's ignoring the .doc files. Maybe there is > some way I can force it to go though them. > > Thanks for any help.. > > Thanks. > > Keith > [EMAIL PROTECTED] > > external_parsers: application/msword /opt/www/htdig/bin/htparsedoc \ > application/postscript /opt/www/htdig/bin/htparsedoc \ > application/pdf /opt/www/htdig/bin/htparsedoc > > database_dir: /opt/www/htdig/db > start_url: http://myurl.com > limit_urls_to: ${start_url} > exclude_urls: /cgi-bin/ .cgi > maintainer: [EMAIL PROTECTED] > max_head_length: 10000 > max_doc_size: 2000000 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-general mailing list <[EMAIL PROTECTED]> > To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html > ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

