[ https://issues.apache.org/jira/browse/NUTCH-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2947. ------------------------------------ Resolution: Fixed > Fetcher: keep state of empty fetch queues unless queue feeder is finished > ------------------------------------------------------------------------- > > Key: NUTCH-2947 > URL: https://issues.apache.org/jira/browse/NUTCH-2947 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.19 > > > If a fetch queue is empty (containing no fetch items) it may be removed from > the list of queues. This also remove the state of a fetch queue, namely the > next fetch time and the exception counter. If the queue feeder is still > active it may happened that the same queue (i.e. associated with the same > host/domain/IP) removed before is created again. In this case, certain > aspects of fetcher politeness cannot be guaranteed anymore: > - the fetch delay (via earliest next fetch time) and > - the mechanism to block fetching from the same host/domain/IP with too many > exceptions (NUTCH-769). > The issue was observed while verifying NUTCH-2946 in the fetcher logs: > {noformat} > ... 10:19:16,912 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:20:16,250 * queue foo.bar >> delayed next fetch by 79248 ms after 2 > exceptions in queue > ... 10:21:52,675 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:25:40,931 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:27:45,066 * queue foo.bar >> delayed next fetch by 79248 ms after 2 > exceptions in queue > ... 10:29:40,407 * queue foo.bar >> delayed next fetch by 100000 ms after 3 > exceptions in queue > ... 10:41:48,870 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:47:54,946 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:52:46,792 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 10:57:43,470 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:01:12,220 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:04:24,621 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:18:40,398 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:21:09,437 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:34:36,052 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:39:17,898 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:40:35,472 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:50:34,224 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:51:27,547 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:53:04,783 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 11:54:04,404 * queue foo.bar >> delayed next fetch by 79248 ms after 2 > exceptions in queue > ... 11:55:38,232 * queue foo.bar >> delayed next fetch by 100000 ms after 3 > exceptions in queue > ... 11:57:37,942 * queue foo.bar >> delayed next fetch by 116096 ms after 4 > exceptions in queue > ... 12:01:08,619 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > ... 12:02:35,985 * queue foo.bar >> delayed next fetch by 50000 ms after 1 > exceptions in queue > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)