I have finished and attached a solution for NUTCH-497. This uses a stack instead of recursion in the DOMContentUtils to avoid stack overflows with extreme nested tags. It also adds a nested tags test page to the fetcher tests.
Please take a look and if there are no issues with this patch I will commit in a day or two. Dennis Kubes Dennis Kubes wrote: > This error is due to a webpage with an extreme nesting of tags. For > example something like <b><i><b><i>.....</i></b></i></b> but thousands > of levels deep. It is a form of a spider trap. > > I just created NUTCH-497 for this issue and attached a very > rudimentary patch as a workaround. The patch successfully fixes the > problem but it is not very robust and has no unit tests as of yet. I > have run this successfully myself. I will provide a more robust patch > when time allows but this should help you for now. > > Dennis Kubes > > djames wrote: >> Thanks a lot for your help >> I'll give you a feedback ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
