Re: bot-traps and refetching

2005-08-30 Thread Michael Ji
the solution for bot-traps and refetching in OC might be able to be combined as one. 1) Refetching will look at the FetcherOutput of last run, and queue the URLs according to their domain name (for http 1.1 protocol) as your FetcherThread does. 2) We might just count the number of URLs within

bot-traps and refetching

2005-08-28 Thread Michael Ji
Hi Kelvin: 1) bot-traps problem for OC If we have a crawling depth for each starting host, it seems that the crawling will be finalized in the end ( we can decrement depth value in each time the outlink falls in same host domain). Let me know if my thought is wrong. 2) refetching If OC's

Re: bot-traps and refetching

2005-08-28 Thread Kelvin Tan
Michael, On Sun, 28 Aug 2005 07:31:06 -0700 (PDT), Michael Ji wrote:  Hi Kelvin:  1) bot-traps problem for OC  If we have a crawling depth for each starting host, it seems that  the crawling will be finalized in the end ( we can decrement depth  value in each time the outlink falls in same host