Hi, I'm using nutch 0.8.1 and I noticed the following. When pageA redirects to pageB (HTTP 3xx), pageA remains unfetched in the crawlDB (pageB is fetched).
Hence, pageA shows up in each generate/fetch/updatedb iteration. Is this a bug? I found a previous thread on this list which describes this issue too: http://www.mail-archive.com/[email protected]/msg04599.html Mathijs ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
