Well, let me know if you figure it out and I will do the same.  I don't
quite understand how those classes would help out.  Would you somehow use
them to create the Reader object that is passed to create the TokenStream
object?

-----Original Message-----
From: Craig Walls [mailto:wallsc@;michaels.com]
Sent: Thursday, November 14, 2002 2:39 PM
To: Lucene Users List
Subject: Re: HTML Analyzer?



Ironically, I just had to solve this exact problem just 10 minutes ago...

Check into javax.swing.text.html.HTMLEditorKit and
javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the
site
is Japanese, but the source code is still Java):

http://java-house.jp/ml/archive/j-h-b/037727.html?#_body



"Lichty, Kent" wrote:

> We have a web application that builds pages "on the fly" by reading
directly
> from a database. The database contains both normal content and HTML.  We
use
> Lucene as our search engine, but I need to figure out how to cause it to
NOT
> include content that is within HTML tags. I assume that this entails the
> creation of a custom Analyzer.  Are there any existing Analyzers already
out
> there that work like this? Thanks!
>
> ----------  Internet E-mail Confidentiality Disclaimer  ----------
>
> PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message.
If
> you are not the addressee indicated in this message or the employee or
agent
> responsible for delivering it to the addressee, you are hereby on notice
> that you are in possession of confidential and privileged information.
Any
> dissemination, distribution, or copying of this e-mail is strictly
> prohibited.  In such case, you should destroy this message and kindly
notify
> the sender by reply e-mail.  Please advise immediately if you or your
> employer do not consent to Internet email for messages of this kind.
>
> Opinions, conclusions, and other information in this message that do not
> relate to the official business of my firm shall be understood as neither
> given nor endorsed by it.
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>



----------  Internet E-mail Confidentiality Disclaimer  ----------

PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message.  If
you are not the addressee indicated in this message or the employee or agent
responsible for delivering it to the addressee, you are hereby on notice
that you are in possession of confidential and privileged information.  Any
dissemination, distribution, or copying of this e-mail is strictly
prohibited.  In such case, you should destroy this message and kindly notify
the sender by reply e-mail.  Please advise immediately if you or your
employer do not consent to Internet email for messages of this kind.

Opinions, conclusions, and other information in this message that do not
relate to the official business of my firm shall be understood as neither
given nor endorsed by it.



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Reply via email to