This error is due to a webpage with an extreme nesting of tags. For example something like <b><i><b><i>.....</i></b></i></b> but thousands of levels deep. It is a form of a spider trap.
I just created NUTCH-497 for this issue and attached a very rudimentary patch as a workaround. The patch successfully fixes the problem but it is not very robust and has no unit tests as of yet. I have run this successfully myself. I will provide a more robust patch when time allows but this should help you for now. Dennis Kubes djames wrote: > Thanks a lot for your help > I'll give you a feedback ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
