[ 
https://issues.apache.org/jira/browse/NUTCH-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579742#comment-17579742
 ] 

ASF GitHub Bot commented on NUTCH-2947:
---------------------------------------

sebastian-nagel merged PR #729:
URL: https://github.com/apache/nutch/pull/729




> Fetcher: keep state of empty fetch queues unless queue feeder is finished
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2947
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2947
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.19
>
>
> If a fetch queue is empty (containing no fetch items) it may be removed from 
> the list of queues. This also remove the state of a fetch queue, namely the 
> next fetch time and the exception counter. If the queue feeder is still 
> active it may happened that the same queue (i.e. associated with the same 
> host/domain/IP) removed before is created again. In this case, certain 
> aspects of fetcher politeness cannot be guaranteed anymore:
> - the fetch delay (via earliest next fetch time) and
> - the mechanism to block fetching from the same host/domain/IP with too many 
> exceptions (NUTCH-769).
> The issue was observed while verifying NUTCH-2946 in the fetcher logs:
> {noformat}
> ... 10:19:16,912 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:20:16,250 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 10:21:52,675 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:25:40,931 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:27:45,066 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 10:29:40,407 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
> exceptions in queue
> ... 10:41:48,870 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:47:54,946 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:52:46,792 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 10:57:43,470 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:01:12,220 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:04:24,621 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:18:40,398 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:21:09,437 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:34:36,052 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:39:17,898 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:40:35,472 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:50:34,224 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:51:27,547 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:53:04,783 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 11:54:04,404 * queue foo.bar >> delayed next fetch by 79248 ms after 2 
> exceptions in queue
> ... 11:55:38,232 * queue foo.bar >> delayed next fetch by 100000 ms after 3 
> exceptions in queue
> ... 11:57:37,942 * queue foo.bar >> delayed next fetch by 116096 ms after 4 
> exceptions in queue
> ... 12:01:08,619 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> ... 12:02:35,985 * queue foo.bar >> delayed next fetch by 50000 ms after 1 
> exceptions in queue
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to