I am running htdig with parse_doc.pl (2000/01/12), pdftotext and catdoc.
When I index my site, the htdig system indeed finds and indexes my .doc
files (and also my .pdf files) properly. They are "findable" by their
content. Fine.
However, I notice that for some reason my htdig.conf file is ALSO indexed,
and it is findable simply by searching on a keyword or two of the
htdig.conf file! The "returned hits" for the htdig.conf file are "entitled"
& "linked to" by all of the .doc files rundig has indexed. That is, if I
have two .doc files, if I search on, say,
"virtual web trees or database"
(this is a phrase inside of htdig.conf) then I get two returned hits, whose
titles & links are to the .doc files, but whose excerpt is the htdig.conf
file. Not fine.
I notice that during rundig, I get two of these errors for each .doc file
to be indexed, e.g.
catdoc: No such file or directory
catdoc: No such file or directory
but indeed each .doc file is indexed just fine.
I am new to the htdig family, so do not think that I know what I am doing.
(I am trying, I am trying.) Any help would be greatly appreciated.
Logan
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev