Well, let me know if you figure it out and I will do the same. I don't quite understand how those classes would help out. Would you somehow use them to create the Reader object that is passed to create the TokenStream object?
-----Original Message----- From: Craig Walls [mailto:wallsc@;michaels.com] Sent: Thursday, November 14, 2002 2:39 PM To: Lucene Users List Subject: Re: HTML Analyzer? Ironically, I just had to solve this exact problem just 10 minutes ago... Check into javax.swing.text.html.HTMLEditorKit and javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the site is Japanese, but the source code is still Java): http://java-house.jp/ml/archive/j-h-b/037727.html?#_body "Lichty, Kent" wrote: > We have a web application that builds pages "on the fly" by reading directly > from a database. The database contains both normal content and HTML. We use > Lucene as our search engine, but I need to figure out how to cause it to NOT > include content that is within HTML tags. I assume that this entails the > creation of a custom Analyzer. Are there any existing Analyzers already out > there that work like this? Thanks! > > ---------- Internet E-mail Confidentiality Disclaimer ---------- > > PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message. If > you are not the addressee indicated in this message or the employee or agent > responsible for delivering it to the addressee, you are hereby on notice > that you are in possession of confidential and privileged information. Any > dissemination, distribution, or copying of this e-mail is strictly > prohibited. In such case, you should destroy this message and kindly notify > the sender by reply e-mail. Please advise immediately if you or your > employer do not consent to Internet email for messages of this kind. > > Opinions, conclusions, and other information in this message that do not > relate to the official business of my firm shall be understood as neither > given nor endorsed by it. > > -- > To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> > For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org> -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org> ---------- Internet E-mail Confidentiality Disclaimer ---------- PRIVILEGED / CONFIDENTIAL INFORMATION may be contained in this message. If you are not the addressee indicated in this message or the employee or agent responsible for delivering it to the addressee, you are hereby on notice that you are in possession of confidential and privileged information. Any dissemination, distribution, or copying of this e-mail is strictly prohibited. In such case, you should destroy this message and kindly notify the sender by reply e-mail. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. Opinions, conclusions, and other information in this message that do not relate to the official business of my firm shall be understood as neither given nor endorsed by it. -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>