Thanks for the update. This all sounds right (no bugs). The problem is the code that you have that translates those < and > characters.
Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Otis, > > I discovered that the actual text that I was dealing with already > converted > the '<' converted to '<', and so forth. So the problem is that > with > something like '<b>College Soccer</b>', Lucene recognizes > the > trailing semi-colon ';' as a word separator, so it can find the term > 'college', but it does not see the ending of 'soccer'. I did confirm > that > it *will* match on 'soccer<' just fine. > > I've proceeded to add a string substitution method which replaces > '<' > with ' ' (four spaces, in order to hopefully keep the offsets > straight). > It appears to work, though I believe it slows down the indexing. > > I don't know enough about the inner design of Lucene to figure this > out, but > it seems logical that there would be a much more efficient way to > handle > this than string operations. > > Anyway, thought I'd bring you up to date. > > Regards, > > Terry > > PS: I've had no responses from the list, so perhaps this is a unique > problem > and doesn't justify a formal fix effort. > > ----- Original Message ----- > From: "Terry Steichen" <[EMAIL PROTECTED]> > To: "Lucene Users Group" <[EMAIL PROTECTED]> > Sent: Friday, October 18, 2002 11:39 AM > Subject: Tags Screwing up Searches > > > Some content I'm indexing contains certain HTML tags, like <p>, <b>, > <i>, > etc. What I find is that when a term I'm searching for touches one > of these > tags (which is fairly typical), the term isn't recognized and the > search > fails. For example, <b>College Soccer</b> doesn't match on either > "college" > or "soccer". I seem to recall someone else bring up a similar > problem with > a word that ends a sentence (and is thus treated as if the period was > part > of the word), but don't recall what the response was and I can't find > that > thread. > > Does anyone have some ideas on what's the best way to handle this? > Filter > out the tags in the process of creating the Document for indexing? Or > through a modification to the Analyzer (I'm using the > StandardAnalyzer)? Or > something else? > > TIA, > > Terry > > > > > -- > To unsubscribe, e-mail: > <mailto:lucene-user-unsubscribe@;jakarta.apache.org> > For additional commands, e-mail: > <mailto:lucene-user-help@;jakarta.apache.org> > __________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>