[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dennis Kubes updated NUTCH-497: ------------------------------- Attachment: nested-tags-trap2.patch added nested-tags-trap2.patch with apache grant > Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider > Trap > ---------------------------------------------------------------------------------- > > Key: NUTCH-497 > URL: https://issues.apache.org/jira/browse/NUTCH-497 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.8.1, 0.9.0, 1.0.0 > Environment: all > Reporter: Dennis Kubes > Assignee: Dennis Kubes > Fix For: 1.0.0 > > Attachments: ExtremeNestedTags.patch, nested-tags-trap.patch, > nested-tags-trap2.patch, nested-tags-trap3.patch > > > Some webpages have a form of a spider trap that causes a > StackOverflowException in DomContentUtils by having nested tags with > thousands of layers deep. DomContentUtils when trying to get outlinks uses a > recursive method to parse the html. With this type of nesting it errors out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.