Far I know currently it is not possible. But if I'm correct, in trunk there is 
applied patch that adapts adopts frequency of page updates on how often it is 
updated. You possibly can use it from night build, or wait for next release.

Regards,
Marcin

> Hi,
> 
> I am trying to solve a problem but I am unable to find any feature in
> Nutch that lets me solve this problem.
> 
> Let's say in my intranet there are 1000 sites.
> 
> Sites 1 to 100 have pages that are never going to change, i.e. they
> are static. So I don't need to crawl them again and again. But extra
> pages may be added to these sites.
> 
> Sites 101 to 500 have pretty dynamic content in which I can expect the
> content to change significantly every 7 days.
> 
> Sites 501 to 1000 are very dynamic and content change can happen in
> any page almost every day.
> 
> So, how can I do recrawls on them in a manner that
> 
> 1) it doesn't crawl the existing pages of the first group (1-100)
> sites but crawl the new pages that have come up.
> 
> 2) re-crawl all pages of the second group at an interval of 7 days.
> 
> 3) re-crawl all pages of the third group every day
> 
> 4) it crawls any new URLs injected into the crawl db during recrawl.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to