> 
> Is it true that if I run htdig to update my db (without -i option),
> htdig will ignore "limit_urls_to" and try all the urls in the db?
> 
> Here is what I did and what I found:
> 1) Run htdig with -i to create the initial db for mycompany.com Intranet.
>    start_url: http://www3.mydept.mycompany.com
>    limit_urls_to:  .mycompany.com
> 2) A week later, I run htdig to update my db (without -i option) for
>    mydept.mycompany.com sub-domain only.
>    start_url: http://www3.mydept.mycompany.com
>    limit_urls_to:  .mydept.mycompany.com
> 
> For some reasons, this update run ignored "limit_urls_to" and
> went through all servers in .mycompany.com !! And it also tries to get
> all documents on all servers instead of only getting the new
> documents. (I checked the www log file on a small server with
> only several html files and found GET for all files although
> there has been no change between the initial htdig and the update
> htdig for this small www server).
> 
> Another strange thing is that although I deleted some html files
> on the server http://www3.mydept.mycompany.com BEFORE the update
> run of htdig, those deleted url still left in the db. A subsequent
> search can still find those matches. I don't set the "remove_bad_urls"
> in my htdig.conf file which means it should be the default true.
> 
> Thanks for any hints!
> 
> Frank
> 
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to