StackOverflowError while parsing

Matt Zytaruk Wed, 30 Nov 2005 10:36:54 -0800

Hello all,

I ran a crawl with parsing disabled, and then later tried to parse thesegment using the parse command. However, there seems to be a bug as Ikeep getting StackOverflowErrors on certain pages. First, I got one onan XML document that the text parser was trying to parse. I disabled thetext parse plugin and then it got past that page no problem. Then lateron, with the text parse plugin still disabled I got the same problem ona html document. This is what happened:

051130 102144 Parsing [http://www.ibcr.org/PAGE_EN/P1_01E.htm] with[EMAIL PROTECTED]

java.lang.StackOverflowError
java.io.IOException: Job failed!
       at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
       at org.apache.nutch.crawl.ParseSegment.parse(ParseSegment.java:91)
       at org.apache.nutch.crawl.ParseSegment.main(ParseSegment.java:110)

This is with the 0.8-dev version of nutch, with the default plugins enabled.

Is this a known issue? Is there some way around this? Is the parsersomehow getting in an infinite loop?


Thanks for any help you can give.

-Matt Zytaruk

StackOverflowError while parsing

Reply via email to