What makes you think that the fetcher is hung?
Doug
Marko Bauhardt wrote:
Hi all,
I use nutch-mapred from the svn-branch. Sometimes the reduce job of the
fetchprocess hangs up. The CoreDump prints out that the RegexUrlFilter
is in work.
In the regex-urlfilter.txt i uncommented the line
[EMAIL PROTECTED]
because I want to fetch dynamic urls like jsp's.
Here is the CoreDump.
051017 151123 reduce > reduce
Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):
"MultiThreadedHttpConnectionManager cleanup" daemon prio=1
tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)
"Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable [6efc3000..6efc3868]
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__tryExpression
(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.__interpret (Unknown
Source)
at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)
at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)
at org.apache.nutch.net.RegexURLFilter.filter
(RegexURLFilter.java:114)
- locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
at org.apache.nutch.crawl.ParseOutputFormat$1.write
(ParseOutputFormat.java:71)
at org.apache.nutch.crawl.FetcherOutputFormat$1.write
(FetcherOutputFormat.java:78)
at org.apache.nutch.mapred.ReduceTask$2.collect
(ReduceTask.java:247)
at org.apache.nutch.mapred.lib.IdentityReducer.reduce
(IdentityReducer.java:41)
at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
at org.apache.nutch.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:90)
"Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting on
condition [0..0]
"Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait()
[70159000..70159868]
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
at java.lang.ref.Finalizer$FinalizerThread.run (Finalizer.java:159)
"Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in
Object.wait() [701da000..701da868]
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run
(Reference.java:115)
- locked <0x753507e8> (a java.lang.ref.Reference$Lock)
"main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition
[bfffb000..bfffb41c]
at java.lang.Thread.sleep(Native Method)
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)
"VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable
"VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on
condition
"Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable