You have to do it yourself, at at least find code that does this. The Lucene sample code has an HTML parser, and I've posted (to lucene-dev) an alternative way of using JTidy to do this.
Erik ----- Original Message ----- From: "Melissa Mifsud" <[EMAIL PROTECTED]> To: "Lucene User" <[EMAIL PROTECTED]> Sent: Tuesday, March 05, 2002 9:14 AM Subject: Indexing HTML with Lucene Hi, Is it necessary to strip the HTML tags from HTML documents BEFORE telling Lucene to index them? Does Lucene do this or will it index the tags too?! Melissa -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>