Sebastian Nagel created NUTCH-3058: -------------------------------------- Summary: Fetcher: counter for hung threads Key: NUTCH-3058 URL: https://issues.apache.org/jira/browse/NUTCH-3058 Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 1.20 Reporter: Sebastian Nagel Assignee: Sebastian Nagel Fix For: 1.21
The Fetcher class defines a "hard" timeout defined as 50% of the MapReduce task timeout, see {{mapreduce.task.timeout}} and {{fetcher.threads.timeout.divisor}}. If there are fetcher threads running but without any progress during the timeout period (in terms of newly started fetch items), Fetcher is shut down to avoid that the task timeout is reached and the fetcher job is failed. The "hung threads" are logged together with the URL being fetched and (DEBUG level) the Java stack. In addition to logging, a job counter should indicate the number of hung threads. This would allow to see on the job level whether there are issues with hung threads. To trace the issues it's still required to look into the Hadoop task logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)