On Tue, Jun 29, 2004 at 10:10:07AM -0600, Vince Taluskie wrote:
> 
> I noticed some odd behavior with it not extracting outlinks within table 
> elements at times... I pulled down a different nightly build and got 
> better behavior - of course, I was also fiddling with the config as well 
> and probably should have quantified the problem a little better...

I have seen all kinds of html attributes altered by cyberneko html parser.
"Not extracting outlinks" may be caused by attribute 'href=' being changed.
This may have to do with its ability to "fix up" html markups.
Interestingly it does not happen in single thread run.

John


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to