Hi Guys, As there were some comments on the user list, I recently got digging with http redirects then stumbled across NUTCH-1042. Although these are individual issues e.g. redirects and crawl delays, I think they are certainly linked, however what is interesting is that users 'usually' don't consider them to be interlinked as such and therefore struggle to debug how and why either the redirect or the crawl delay pages are not being fetched.
Doing some more digging I found the now rather old and tatty NUTCH-475, which obviously got me thinking about how we maintain the AdaptiveFetchSchedule for custom refetching. Now I begin to start thinking about the following - Regardless of whether we implement an AdaptiveCrawlDelay, NUTCH-1042 still needs fixed as this is obviously becoming a bit of a pain for some users. - Can someone shine some light on what happened to Fetcher2.java that Dogacan refers to? I was only ever accustomed to OldFetcher and Fetcher :0) - For you guys managing/running/maintaining your own (and possibly clients) web servers, what are the perceptions of maintaining your own AdaptiveCrawlDelay? Pro's and Con's (apart from the obvious) I can't really think of anything else at the moment! Thanks Lewis -- *Lewis*