What makes you think that the fetcher is hung?

Doug

Marko Bauhardt wrote:
Hi all,
I use nutch-mapred from the svn-branch. Sometimes the reduce job of the fetchprocess hangs up. The CoreDump prints out that the RegexUrlFilter is in work.
In the regex-urlfilter.txt i uncommented the line
[EMAIL PROTECTED]

because I want to fetch dynamic urls like jsp's.



Here is the CoreDump.

051017 151123 reduce > reduce
Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):

"MultiThreadedHttpConnectionManager cleanup" daemon prio=1 tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
        - locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager $ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)

"Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable  [6efc3000..6efc3868]
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
        at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
at org.apache.oro.text.regex.Perl5Matcher.__tryExpression (Unknown Source) at org.apache.oro.text.regex.Perl5Matcher.__interpret (Unknown Source)
        at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  Source)
        at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  Source)
at org.apache.nutch.net.RegexURLFilter.filter (RegexURLFilter.java:114)
        - locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
        at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
at org.apache.nutch.crawl.ParseOutputFormat$1.write (ParseOutputFormat.java:71) at org.apache.nutch.crawl.FetcherOutputFormat$1.write (FetcherOutputFormat.java:78) at org.apache.nutch.mapred.ReduceTask$2.collect (ReduceTask.java:247) at org.apache.nutch.mapred.lib.IdentityReducer.reduce (IdentityReducer.java:41)
        at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
at org.apache.nutch.mapred.LocalJobRunner$Job.run (LocalJobRunner.java:90)

"Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting on condition [0..0]

"Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait() [70159000..70159868]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
        - locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
        at java.lang.ref.Finalizer$FinalizerThread.run (Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in Object.wait() [701da000..701da868]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run (Reference.java:115)
        - locked <0x753507e8> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition [bfffb000..bfffb41c]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
        at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
        at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)

"VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable

"VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on condition
"Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable


Reply via email to