Sebastian Nagel created NUTCH-2946:
--------------------------------------

             Summary: Fetcher: optionally slow down fetching from hosts with 
repeated exceptions
                 Key: NUTCH-2946
                 URL: https://issues.apache.org/jira/browse/NUTCH-2946
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.18
            Reporter: Sebastian Nagel
            Assignee: Sebastian Nagel
             Fix For: 1.19


The fetcher holds for every fetch queue a counter which counts the number of 
observed "exceptions" seen when fetching from the host (resp. domain or IP) 
bound to this queue.

As an improvement to increase the politeness of the crawler, the counter value 
could be used to dynamically increase the fetch delay for hosts where requests 
fail repeatedly with exceptions or HTTP status codes mapped to 
ProtocolStatus.EXCEPTION (HTTP 403 Forbidden, 429 Too many requests, 5xx server 
errors, etc.) Of course, this should be optional. The aim to reduce the load on 
such hosts already before the configured max. number of exceptions (property 
fetcher.max.exceptions.per.queue) is hit.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to