Sebastian Nagel created NUTCH-3058:
--------------------------------------

             Summary: Fetcher: counter for hung threads
                 Key: NUTCH-3058
                 URL: https://issues.apache.org/jira/browse/NUTCH-3058
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.20
            Reporter: Sebastian Nagel
            Assignee: Sebastian Nagel
             Fix For: 1.21


The Fetcher class defines a "hard" timeout defined as 50% of the MapReduce task 
timeout, see {{mapreduce.task.timeout}} and 
{{fetcher.threads.timeout.divisor}}. If there are fetcher threads running but 
without any progress during the timeout period (in terms of newly started fetch 
items), Fetcher is shut down to avoid that the task timeout is reached and the 
fetcher job is failed. The "hung threads" are logged together with the URL 
being fetched and (DEBUG level) the Java stack.

In addition to logging, a job counter should indicate the number of hung 
threads. This would allow to see on the job level whether there are issues with 
hung threads. To trace the issues it's still required to look into the Hadoop 
task logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to