On Jan 17, 2006, at 12:14 AM, jason wrote:
It is adding tokens into the same position as the original token. And then, I used the QueryParser for searching and the snowball analyzer for parsing.

Ok, so you're only using the SynonymAnalyzer for indexing, and the SnowballAnalyzer for QueryParser, correct? If so, that is reasonable.

    public TokenStream tokenStream(String fieldName, Reader reader){

        TokenStream result = new StandardTokenizer(reader);
        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);
        if (stopword != null){
          result = new StopFilter(result, stopword);
        }

        result = new SnowballFilter(result, "Lovins");

        result = new SynonymFilter(result, engine);

        return result;
    }

}
I write some code in the snowballfitler (line 75-79). If i only used the snowballfilter, the term "support" can be found in all the 17 documents. However, if the code "result = new SynonymFilter(result, engine);" is used.
The term "support" cannot be found in some documents.


It looks like you borrowed SynonymAnalyzer from the Lucene in Action code. But you've tweaked some things. One thing that is clearly amiss is that you're looking up synonyms for stemmed words, which is not going to work (unless you stemmed the WordNet words beforehand, but I doubt you did that and it would quite odd to do so). You're probably not injecting many synonyms at all.

I encourage you to "analyze your analyzer" by running some utilities such as the Analyzer demo that comes with Lucene in Action's code. You'll have some more insight into this issue when trying this out in isolation from query parsing and other complexities.

  /** Returns the next input Token, after being stemmed */
  public final Token next() throws IOException {
    Token token = input.next();
    if (token == null)
      return null;
    stemmer.setCurrent(token.termText());
    try {
      stemMethod.invoke(stemmer, EMPTY_ARGS);
    } catch (Exception e) {
      throw new RuntimeException(e.toString());
    }

    Token newToken = new Token(stemmer.getCurrent(),
token.startOffset(), token.endOffset(), token.type());
    //check the tokens.
    if(newToken.termText().equals("support")){
        System.out.println("the term support is found");
    }

I'm not sure what the exact solution to your dilemma is, but doing more testing with your analyzer will likely shed light on it for you.

        Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to