I've had fairly good experience with Jtidy! But HTMLParser http://htmlparser.sourceforge.net/ seems to have the lighter looking API. It is Event based and I might need to parse some large HTML sometime soon, where DOM might be the problem. Does anyone have practical experience with HTMLParser?
Thanks Frank > -----Ursprüngliche Nachricht----- > Von: petite_abeille [mailto:[EMAIL PROTECTED] > Gesendet: Dienstag, 25. Februar 2003 19:49 > An: Lucene Users List > Betreff: Re: Best HTML Parser !! > > > > On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote: > > > I have some good experiences with JTidy. It works like > DOM-XML parser > > and cleans HTML it by the way. > > I use jtidy also. Both for parsing and clean-up. Works pretty nicely. > > > This is VERY useful, because EVERY HTML have at least ONE error. > > This rule should be tattooed on every parsers head: out of the > laboratory, nothing is compliant. Which render the race to "more > compliance" among the different parsers somewhat ridiculous. > > Cheers, > > PA. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]