On Thursday 17 July 2003 07:20, greg wrote:
> I have several document sections that are being indexed via the
> StandardAnalyzer.  One of these documents has the line "access, the
> manager".  When searching for the phrase "access manager", this document is
> being returned.  I understand why (at least i think i do), because a stop
> word is "the" and the "," is being removed by the tokenizer, my question is
> is there any way I can avoid having this returned in the results?  My
> thoughts were to create a new analyzer that indexes the word "the" (blick
> to many of those), or index the "," in some way (also not good).  Any
> suggestions?

You can also replace all stop words with "dummy" token ("" might be an ok 
candidate?). That would be similar to indexing "the" (which probably is  
better idea than indexing ",").

I'm planning to do something similar for paragraph breaks (in case of plain 
text, double linefeed, for HTML <p> etc), to prevent similar problems.

-+ Tatu +-



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to