Dennis, +1
On 6/25/07 4:42 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > If no one has any objections, I will go ahead and commit this. > > Dennis Kubes > > Dennis Kubes (JIRA) wrote: >> [ >> https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugi >> n.system.issuetabpanels:all-tabpanel ] >> >> Dennis Kubes updated NUTCH-497: >> ------------------------------- >> >> Attachment: nested-tags-trap3.patch >> >> added nested-tags-trap3.patch with apache grant >> >>> Extreme Nested Tags causes StackOverflowException in >>> DomContentUtils...Spider Trap >>> ---------------------------------------------------------------------------- >>> ------ >>> >>> Key: NUTCH-497 >>> URL: https://issues.apache.org/jira/browse/NUTCH-497 >>> Project: Nutch >>> Issue Type: Bug >>> Components: fetcher >>> Affects Versions: 0.8.1, 0.9.0, 1.0.0 >>> Environment: all >>> Reporter: Dennis Kubes >>> Assignee: Dennis Kubes >>> Fix For: 1.0.0 >>> >>> Attachments: ExtremeNestedTags.patch, nested-tags-trap.patch, >>> nested-tags-trap2.patch, nested-tags-trap3.patch >>> >>> >>> Some webpages have a form of a spider trap that causes a >>> StackOverflowException in DomContentUtils by having nested tags with >>> thousands of layers deep. DomContentUtils when trying to get outlinks uses >>> a recursive method to parse the html. With this type of nesting it errors >>> out. >>