Hi all, I'm a new user of Nutch. I use Nutch primarily to crawl blog and news sites. But I noticed that Nutch fetches pages only on some refresh interval (30 days default).
Blog and news sites have unique characteristic that some of their pages are updated very frequently (e.g. the main page) so they have to be refetched often, while other pages don't need to be refreshed / refetched at all (e.g. the news article pages, which eventually will become 'obsolete'). Is there any way to force update some URLs? Can I just 're-inject' the URLs to set the next fetch date to 'immediately'? Thank you, -- Arie Karhendana ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
