Is it possible that the Analyzer is stripping <, >, and / characters
and leaving you with terms like: bCollege and Soccerb ?

Otis

--- Terry Steichen <[EMAIL PROTECTED]> wrote:
> Some content I'm indexing contains certain HTML tags, like <p>, <b>,
> <i>, etc.  What I find is that when a term I'm searching for touches
> one of these tags (which is fairly typical), the term isn't
> recognized and the search fails.  For example, <b>College Soccer</b>
> doesn't match on either "college" or "soccer".  I seem to recall
> someone else bring up a similar problem with a word that ends a
> sentence (and is thus treated as if the period was part of the word),
> but don't recall what the response was and I can't find that thread.
> 
> Does anyone have some ideas on what's the best way to handle this? 
> Filter out the tags in the process of creating the Document for
> indexing? Or through a modification to the Analyzer (I'm using the
> StandardAnalyzer)? Or something else?
> 
> TIA,
> 
> Terry
> 
> 


__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Reply via email to