[ 
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12449332 ] 
            
Armel Nene commented on NUTCH-61:
---------------------------------

In the fetcher source code : src\java\org\apache\nutch\fetcher.java there is 
this condition which checks to see status of file or url (not to sure) to see 
if it has been modified. (line ~ 212: case ProtocolStatus.NOTMODIFIED: ). The 
implementation checks to see if the file is not modified, if TRUE then do 
nothing, right? It seems that the system already checks the modification data 
and behaves accordingly ( correct me if I'm wrong). Therefore, the patch here 
will not be useful in the context of checking the file and providing the 
appropriate data. One of the use for this patch is when the user requires the 
system to give lower priorities to unmodified files by increasing the re-fetch 
time. So it seems that the fetcher can already identify unmodified content. 
I'll run a few test and post the results here.

> Adaptive re-fetch interval. Detecting umodified content
> -------------------------------------------------------
>
>                 Key: NUTCH-61
>                 URL: http://issues.apache.org/jira/browse/NUTCH-61
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>         Attachments: 20050606.diff, 20051230.txt, 20060227.txt, 
> nutch-61-417287.patch
>
>
> Currently Nutch doesn't adjust automatically its re-fetch period, no matter 
> if individual pages change seldom or frequently. The goal of these changes is 
> to extend the current codebase to support various possible adjustments to 
> re-fetch times and intervals, and specifically a re-fetch schedule which 
> tries to adapt the period between consecutive fetches to the period of 
> content changes.
> Also, these patches implement checking if the content has changed since last 
> fetching; protocol plugins are also changed to make use of this information, 
> so that if content is unmodified it doesn't have to be fetched and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to