okayndc,

A field configured to use HTMLStripCharFilter as part of its index-time 
analyzer will strip out HTML tags before index terms are created by the 
tokenizer, so HTML tags will not be put into the index.  As a result, queries 
for HTML tags cannot match the original documents' HTML tags (in the field 
configured to use HTMLStripCharFilter, anyway).

So HTMLStripCharFilter should do what you want.

Steve

From: okayndc [mailto:bodymo...@gmail.com]
Sent: Thursday, April 05, 2012 3:36 PM
To: Steven A Rowe
Cc: java-user@lucene.apache.org
Subject: Re: HTML tags and Lucene highlighting

Hello,

I want to ignore HTML tags within a search.  ~ I should not be able to search 
for a HTML tag (ex. <strong>) and get back the highlighted HTML tag (ex. <span 
class="highlighted"><strong></span>) in a result set.

Thanks

On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe 
<sar...@syr.edu<mailto:sar...@syr.edu>> wrote:
Hi okayndc,

What *do* you want?

Steve

-----Original Message-----
From: okayndc [mailto:bodymo...@gmail.com<mailto:bodymo...@gmail.com>]
Sent: Thursday, April 05, 2012 1:34 PM
To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>
Subject: HTML tags and Lucene highlighting

Hello,

I currently use Lucene version 3.0...probably need to upgrade to a more current 
version soon.
The problem that I have is when I test search for a an HTML tag (ex.
<strong>), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want.  Is there a way to 
"filter" HTML tags?
I have read up on HTMLStripChar filter (packaged with Solr) and wondered if 
this is the way to go?

Any help will be greatly appreciated,
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: 
java-user-unsubscr...@lucene.apache.org<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: 
java-user-h...@lucene.apache.org<mailto:java-user-h...@lucene.apache.org>

Reply via email to