[ http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12361131 ]
raghavendra prabhu commented on NUTCH-61: ----------------------------------------- Will the same thing work for a filesystem For a file system , We can directly get the modified date store it in the db The plugins will have a look at the content date and if it is different they will index it Otherwise they will not fetch it This can be a solution for file based content (The thing is it does away entirely with fetch interval and takes decision only based upon file modification date) > Adaptive re-fetch interval. Detecting umodified content > ------------------------------------------------------- > > Key: NUTCH-61 > URL: http://issues.apache.org/jira/browse/NUTCH-61 > Project: Nutch > Type: New Feature > Components: fetcher > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Attachments: 20050606.diff > > Currently Nutch doesn't adjust automatically its re-fetch period, no matter > if individual pages change seldom or frequently. The goal of these changes is > to extend the current codebase to support various possible adjustments to > re-fetch times and intervals, and specifically a re-fetch schedule which > tries to adapt the period between consecutive fetches to the period of > content changes. > Also, these patches implement checking if the content has changed since last > fetching; protocol plugins are also changed to make use of this information, > so that if content is unmodified it doesn't have to be fetched and processed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira