Re: [BULK] StandardAnalyzer question

Ryan Heinen Fri, 29 Sep 2006 14:12:19 -0700

Van Nguyen wrote:

I have a field in my index that is being tokenized using theStandardAnalyzer. Let’s say that field was:
TOOLS FOR TRAILER
The word “FOR” is a stop word so it is not being indexed (based on theStandardAnaylzyer). When someone types in TOOLS FOR TRAILER in, I havea BooleanQuery search for:
+CONTENTS:tools +CONTENTS:for +CONTENTS:trailer
Which will result in no match because of the “AND” search on“+CONTENTS:for”.Do I have to have any logic to stripe the BooleanQuery of any stop wordsused in the StandardAnalyzer?

It depends on how you are generating your Lucene query. If you are usingthe QueryParser you can just pass in a StandardAnalyzer when you create it:

QueryParser parser = new QueryParser("defaultField", newStandardAnalyzer());

However if you are generating the BooleanQuery yourself you will want tomake sure that you run the text through the StandardAnalyzer, andconstruct it based on the tokens that the analyzer emits, eg.


Analyzer a = new StandardAnalyzer();
TokenStream ts = a.tokenStream("fieldName", new StringReader(query));

Token t = ts.next();

while (null != t) {
        String token = t.termText();
        // build your query using these token
        ...
}

...

Either method will eliminate the stop words from your query string.

Hope that helps,

Ryan

Van


------------------------------------------------------------------------



United Rentals
Consider it done.™
800-UR-RENTS
unitedrentals.com



------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [BULK] StandardAnalyzer question

Reply via email to