Some content I'm indexing contains certain HTML tags, like <p>, <b>, <i>, etc. What I find is that when a term I'm searching for touches one of these tags (which is fairly typical), the term isn't recognized and the search fails. For example, <b>College Soccer</b> doesn't match on either "college" or "soccer". I seem to recall someone else bring up a similar problem with a word that ends a sentence (and is thus treated as if the period was part of the word), but don't recall what the response was and I can't find that thread.
Does anyone have some ideas on what's the best way to handle this? Filter out the tags in the process of creating the Document for indexing? Or through a modification to the Analyzer (I'm using the StandardAnalyzer)? Or something else? TIA, Terry