pages that serverside forwards will be refetched every time
-----------------------------------------------------------

                 Key: NUTCH-353
                 URL: http://issues.apache.org/jira/browse/NUTCH-353
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.8.1, 0.9.0
            Reporter: Stefan Groschupf
            Priority: Blocker
             Fix For: 0.8.1
         Attachments: doNotRefecthForwarderPagesV1.patch

Pages that do a serverside forward are not written with a status change back 
into the crawlDb. Also the nextFetchTime is not changed. 
This causes a refetch of the same page again and again. The result is nutch is 
not polite and refetching the forwarding and target page in each segment 
iteration. Also it effects the scoring since the forward page contribute it's 
score to all outlinks.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to