Re: One problem of using the lucene

Erik Hatcher Tue, 17 Jan 2006 02:39:37 -0800


On Jan 17, 2006, at 12:14 AM, jason wrote:

It is adding tokens into the same position as the original token.And then,I used the QueryParser for searching and the snowball analyzer forparsing.

Ok, so you're only using the SynonymAnalyzer for indexing, and theSnowballAnalyzer for QueryParser, correct? If so, that is reasonable.

    public TokenStream tokenStream(String fieldName, Reader reader){

        TokenStream result = new StandardTokenizer(reader);
        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);
        if (stopword != null){
          result = new StopFilter(result, stopword);
        }

        result = new SnowballFilter(result, "Lovins");

        result = new SynonymFilter(result, engine);

        return result;
    }

}

I write some code in the snowballfitler (line 75-79). If i onlyused thesnowballfilter, the term "support" can be found in all the 17documents.However, if the code "result = new SynonymFilter(result, engine);"is used.

The term "support" cannot be found in some documents.

It looks like you borrowed SynonymAnalyzer from the Lucene in Actioncode. But you've tweaked some things. One thing that is clearlyamiss is that you're looking up synonyms for stemmed words, which isnot going to work (unless you stemmed the WordNet words beforehand,but I doubt you did that and it would quite odd to do so). You'reprobably not injecting many synonyms at all.

I encourage you to "analyze your analyzer" by running some utilitiessuch as the Analyzer demo that comes with Lucene in Action's code.You'll have some more insight into this issue when trying this out inisolation from query parsing and other complexities.

  /** Returns the next input Token, after being stemmed */
  public final Token next() throws IOException {
    Token token = input.next();
    if (token == null)
      return null;
    stemmer.setCurrent(token.termText());
    try {
      stemMethod.invoke(stemmer, EMPTY_ARGS);
    } catch (Exception e) {
      throw new RuntimeException(e.toString());
    }

    Token newToken = new Token(stemmer.getCurrent(),

token.startOffset(), token.endOffset(),token.type());

    //check the tokens.
    if(newToken.termText().equals("support")){
        System.out.println("the term support is found");
    }

I'm not sure what the exact solution to your dilemma is, but doingmore testing with your analyzer will likely shed light on it for you.


        Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: One problem of using the lucene

Reply via email to