Thanks for the reply. If I use cron it solves half of my problem... It helps to do the schedule crawling... But how to do incremental crawling... If I use cron it runs the same command once in a week but it crawls 1 million documents each time.. Any ideas to do only incremental crawling.......
rameshgalla wrote: > > I want to do schedule crawling in nutch..... > Eg: I have crawled a site which has 1 million pages. > and want to crawl the same site for updates once per week > automatically(scheduled & incremental crawling). > It has to crawl only modified or newly added content. > > Is it possible with nutch? > > If possible how can I achieve it? > -- View this message in context: http://www.nabble.com/scheduled-crawling-in-nutch-tp19087524p19088069.html Sent from the Nutch - User mailing list archive at Nabble.com.
