Is it true that if I run htdig to update my db (without -i option),
htdig will ignore "limit_urls_to" and try all the urls in the db?
Here is what I did and what I found:
1) Run htdig with -i to create the initial db for mycompany.com Intranet.
start_url: http://www3.mydept.mycompany.com
limit_urls_to: .mycompany.com
2) A week later, I run htdig to update my db (without -i option) for
mydept.mycompany.com sub-domain only.
start_url: http://www3.mydept.mycompany.com
limit_urls_to: .mydept.mycompany.com
For some reasons, this update run ignored "limit_urls_to" and
went through all servers in .mycompany.com !! And it also tries to get
all documents on all servers instead of only getting the new
documents. (I checked the www log file on a small server with
only several html files and found GET for all files although
there has been no change between the initial htdig and the update
htdig for this small www server).
Another strange thing is that although I deleted some html files
on the server http://www3.mydept.mycompany.com BEFORE the update
run of htdig, those deleted url still left in the db. A subsequent
search can still find those matches. I don't set the "remove_bad_urls"
in my htdig.conf file which means it should be the default true.
Thanks for any hints!
Frank
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.