You have to do it yourself, at at least find code that does this.  The
Lucene sample code has an HTML parser, and I've posted (to lucene-dev) an
alternative way of using JTidy to do this.

    Erik

----- Original Message -----
From: "Melissa Mifsud" <[EMAIL PROTECTED]>
To: "Lucene User" <[EMAIL PROTECTED]>
Sent: Tuesday, March 05, 2002 9:14 AM
Subject: Indexing HTML with Lucene


Hi,

Is it necessary to strip the HTML tags from HTML documents BEFORE telling
Lucene to index them? Does Lucene do this or will it index the tags too?!

Melissa



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to