Dennis, +1

On 6/25/07 4:42 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote:

> If no one has any objections, I will go ahead and commit this.
> 
> Dennis Kubes
> 
> Dennis Kubes (JIRA) wrote:
>>      [ 
>> https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugi
>> n.system.issuetabpanels:all-tabpanel ]
>> 
>> Dennis Kubes updated NUTCH-497:
>> -------------------------------
>> 
>>     Attachment: nested-tags-trap3.patch
>> 
>> added nested-tags-trap3.patch with apache grant
>> 
>>> Extreme Nested Tags causes StackOverflowException in
>>> DomContentUtils...Spider Trap
>>> ----------------------------------------------------------------------------
>>> ------
>>> 
>>>                 Key: NUTCH-497
>>>                 URL: https://issues.apache.org/jira/browse/NUTCH-497
>>>             Project: Nutch
>>>          Issue Type: Bug
>>>          Components: fetcher
>>>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>>>         Environment: all
>>>            Reporter: Dennis Kubes
>>>            Assignee: Dennis Kubes
>>>             Fix For: 1.0.0
>>> 
>>>         Attachments: ExtremeNestedTags.patch, nested-tags-trap.patch,
>>> nested-tags-trap2.patch, nested-tags-trap3.patch
>>> 
>>> 
>>> Some webpages have a form of a spider trap that causes a
>>> StackOverflowException in DomContentUtils by having nested tags with
>>> thousands of layers deep.  DomContentUtils when trying to get outlinks uses
>>> a recursive method to parse the html.  With this type of nesting it errors
>>> out.
>> 


Reply via email to