Pierre Lacchini wrote:

Hello,

i'm trying to index html file with Lucene.
Do u know what's the best HTML Parser in Java ? The most Powerful ?
I need to extract meta-tag, and many other differents text fields...


Thx for ur help ;)



I have some good experiences with JTidy. It works like DOM-XML parser and cleans HTML it by the way.
This is VERY useful, because EVERY HTML have at least ONE error.


Documents that was unparsable with Neko JTidy parsed without problems.

Creating indexing program was work for 2 hours.

--
Lukas Zapletal      [EMAIL PROTECTED]
http://www.tanecni-olomouc.cz/lzap




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to