Some content I'm indexing contains certain HTML tags, like <p>, <b>, <i>, etc.  What I 
find is that when a term I'm searching for touches one of these tags (which is fairly 
typical), the term isn't recognized and the search fails.  For example, <b>College 
Soccer</b> doesn't match on either "college" or "soccer".  I seem to recall someone 
else bring up a similar problem with a word that ends a sentence (and is thus treated 
as if the period was part of the word), but don't recall what the response was and I 
can't find that thread.

Does anyone have some ideas on what's the best way to handle this?  Filter out the 
tags in the process of creating the Document for indexing? Or through a modification 
to the Analyzer (I'm using the StandardAnalyzer)? Or something else?



Reply via email to