Yes catdoc is installed at works great. I've put in the correct path for catdoc in all the parsers, and like I said htparsedoc and parse_doc.pl work just fine. I can point them to a word doc and it works, but doc2html dosen't.
That's why it's so confusing. I've followed the docs, catdoc is installed and works, 2 of the 3 paerser work from the command line. I've put in what I've seen in the examples in my config. Beyond that I don't know why it's not working. Keith On Tue, 2002-12-10 at 02:22, David Adams wrote: > Forgive me if I'm misjudging the case, but it sounds as though you have NOT > read the instructions. If you had, you would know that doc2html does not > require the commercial converter wp2html. It can be used with the freeware > catdoc, though wp2html should give better results. > > You should also know that parse_doc.pl requires catdoc and does not work > without it. Have you installed catdoc? > > 'Fraid I not familiar with htparsedoc, but the same probably applies. > > -- > David Adams > Information Systems Services > Southampton University > > > ----- Original Message ----- > From: "Keith Pettit" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Monday, December 09, 2002 10:38 PM > Subject: [htdig] Won't index word doc's > > > > I am so boggled with this. I've followed all the instructions tried > > different vresion of htdig, different paresers and nothing seems to work > > and I can't tell where the failure is. > > > > Basically I'm running htdig on a index page I created. All this page has > > is a bunch of links to word documents. But it don't search though any > > of the doc's. > > > > This is what I get when I run it: > > htdig: Run complete > > htdig: 1 server seen: > > htdig: www.drgutah.com:80 1 document > > > > I've tried using htparsedoc, parse_doc.pl, and the doc2html. htparsedoc > > and parse_doc.pl work by themselves if just execute them by themselves > > and point them at a word file, can't get doc2html to work and I assume > > it's becuase I won't buy the commerical coverter. So I'm assumin there > > is some sort of issue in my config. I've got it pointing to the right > > places it just seems like it's ignoring the .doc files. Maybe there is > > some way I can force it to go though them. > > > > Thanks for any help.. > > > > Thanks. > > > > Keith > > [EMAIL PROTECTED] > > > > external_parsers: application/msword /opt/www/htdig/bin/htparsedoc \ > > application/postscript /opt/www/htdig/bin/htparsedoc \ > > application/pdf /opt/www/htdig/bin/htparsedoc > > > > database_dir: /opt/www/htdig/db > > start_url: http://myurl.com > > limit_urls_to: ${start_url} > > exclude_urls: /cgi-bin/ .cgi > > maintainer: [EMAIL PROTECTED] > > max_head_length: 10000 > > max_doc_size: 2000000 > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > htdig-general mailing list <[EMAIL PROTECTED]> > > To unsubscribe, send a message to > <[EMAIL PROTECTED]> with a subject of unsubscribe > > FAQ: http://htdig.sourceforge.net/FAQ.html > > > ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

