I am running htdig with parse_doc.pl (2000/01/12), pdftotext and catdoc.

When I index my site, the htdig system indeed finds and indexes my .doc 
files (and also my .pdf files) properly. They are "findable" by their 
content. Fine.

However, I notice that for some reason my htdig.conf file is ALSO indexed, 
and it is findable simply by searching on a keyword or two of the 
htdig.conf file! The "returned hits" for the htdig.conf file are "entitled" 
& "linked to" by all of the .doc files rundig has indexed. That is, if I 
have two .doc files, if I search on, say,

"virtual web trees or database"

(this is a phrase inside of htdig.conf) then I get two returned hits, whose 
titles & links are to the .doc files, but whose excerpt is the htdig.conf 
file. Not fine.

I notice that during rundig, I get two of these errors for each .doc file 
to be indexed, e.g.

catdoc: No such file or directory
catdoc: No such file or directory

but indeed each .doc file is indexed just fine.

I am new to the htdig family, so do not think that I know what I am doing. 
(I am trying, I am trying.) Any help would be greatly appreciated.

Logan 


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to