J�r�me Charron wrote:
Then, the question is where the TheDateToCheck value comes from?
1. From the previously indexed document (I know that this information is stored): It certainly consumes more process time that the second solution. My knowledge of Nutch internal is not enougth to know how to retrieve quickly this information from the document's url... someone can help us on this point?

The most efficient place to store this would be in the pagedb. What's stored there currently is the nextFetch date and the fetchInterval. This could be changed to lastModified and fetchInterval, with nextFetch calculated as lastModified+fetchInterval. In UpdateDatabaseTool.java these can both be updated. If the lastModified has not changed then then fetchInterval can be increased accordingly.


Doug

Reply via email to