Far I know currently it is not possible. But if I'm correct, in trunk there is applied patch that adapts adopts frequency of page updates on how often it is updated. You possibly can use it from night build, or wait for next release.
Regards, Marcin > Hi, > > I am trying to solve a problem but I am unable to find any feature in > Nutch that lets me solve this problem. > > Let's say in my intranet there are 1000 sites. > > Sites 1 to 100 have pages that are never going to change, i.e. they > are static. So I don't need to crawl them again and again. But extra > pages may be added to these sites. > > Sites 101 to 500 have pretty dynamic content in which I can expect the > content to change significantly every 7 days. > > Sites 501 to 1000 are very dynamic and content change can happen in > any page almost every day. > > So, how can I do recrawls on them in a manner that > > 1) it doesn't crawl the existing pages of the first group (1-100) > sites but crawl the new pages that have come up. > > 2) re-crawl all pages of the second group at an interval of 7 days. > > 3) re-crawl all pages of the third group every day > > 4) it crawls any new URLs injected into the crawl db during recrawl. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
