We're running a crawl using nutch and the last crawl seemed to be taking a long time. Looking at the output, it seems it's gone into AOL's search and is actually crawling search results (it's also crawling some cgi-bin search results page on another site). This sure seems like it could go on forever.

Admittedly we haven't looked at this very deeply yet (I'm not sure why it's got so many search pages on AOL to crawl), but this strikes me that it's likely a common occurrence if it's acting that way. Is there something we should be doing to prevent this situation?

Thanks.

Reply via email to