Is it possible that the Analyzer is stripping <, >, and / characters and leaving you with terms like: bCollege and Soccerb ?
Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Some content I'm indexing contains certain HTML tags, like <p>, <b>, > <i>, etc. What I find is that when a term I'm searching for touches > one of these tags (which is fairly typical), the term isn't > recognized and the search fails. For example, <b>College Soccer</b> > doesn't match on either "college" or "soccer". I seem to recall > someone else bring up a similar problem with a word that ends a > sentence (and is thus treated as if the period was part of the word), > but don't recall what the response was and I can't find that thread. > > Does anyone have some ideas on what's the best way to handle this? > Filter out the tags in the process of creating the Document for > indexing? Or through a modification to the Analyzer (I'm using the > StandardAnalyzer)? Or something else? > > TIA, > > Terry > > __________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>