Van Nguyen wrote:
I have a field in my index that is being tokenized using the StandardAnalyzer. Let’s say that field was:

TOOLS FOR TRAILER

The word “FOR” is a stop word so it is not being indexed (based on the StandardAnaylzyer). When someone types in TOOLS FOR TRAILER in, I have a BooleanQuery search for:

+CONTENTS:tools +CONTENTS:for +CONTENTS:trailer

Which will result in no match because of the “AND” search on “+CONTENTS:for”. Do I have to have any logic to stripe the BooleanQuery of any stop words used in the StandardAnalyzer?

It depends on how you are generating your Lucene query. If you are using the QueryParser you can just pass in a StandardAnalyzer when you create it:

QueryParser parser = new QueryParser("defaultField", new StandardAnalyzer());

However if you are generating the BooleanQuery yourself you will want to make sure that you run the text through the StandardAnalyzer, and construct it based on the tokens that the analyzer emits, eg.

Analyzer a = new StandardAnalyzer();
TokenStream ts = a.tokenStream("fieldName", new StringReader(query));

Token t = ts.next();

while (null != t) {
        String token = t.termText();
        // build your query using these token
        ...
}

...

Either method will eliminate the stop words from your query string.

Hope that helps,

Ryan

Van


------------------------------------------------------------------------



United Rentals
Consider it done.™
800-UR-RENTS
unitedrentals.com



------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to