On Tue, Jun 29, 2004 at 10:10:07AM -0600, Vince Taluskie wrote: > > I noticed some odd behavior with it not extracting outlinks within table > elements at times... I pulled down a different nightly build and got > better behavior - of course, I was also fiddling with the config as well > and probably should have quantified the problem a little better...
I have seen all kinds of html attributes altered by cyberneko html parser. "Not extracting outlinks" may be caused by attribute 'href=' being changed. This may have to do with its ability to "fix up" html markups. Interestingly it does not happen in single thread run. John ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
