Hi, the following code is the SynonymFilter i wrote.
import org.apache.lucene.analysis.*; import java.io.*; import java.util.*; /** * @author JIANG XING * * Jan 15, 2006 */ public class SynonymFilter extends TokenFilter { public static final String TOKEN_TYPE_SYNONYM = "SYNONYM"; private Stack synonymStack; private WordNetSynonymEngine engine; public SynonymFilter(TokenStream in, WordNetSynonymEngine engine){ super(in); synonymStack = new Stack(); this.engine = engine; } public Token next () throws IOException { if(synonymStack.size() > 0){ return (Token) synonymStack.pop(); } Token token = input.next(); if(token == null){ return null; } addAliasesToStack(token); return token; } private void addAliasesToStack(Token token) throws IOException { String [] synonyms = engine.getSynonyms(token.termText()); if(synonyms == null) return; for(int i = 0; i < synonyms.length; i++) { Token synToken = new Token(synonyms[i], token.startOffset(), token.endOffset(), TOKEN_TYPE_SYNONYM); synToken.setPositionIncrement(0); // synonymStack.push(synToken); } } } It is adding tokens into the same position as the original token. And then, I used the QueryParser for searching and the snowball analyzer for parsing. the following is the SynonymAnalyzer I wrote. import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.*; import org.apache.lucene.analysis.snowball.*; import java.io.*; import java.util.*; /** * @author JIANG XING * * Jan 15, 2006 */ public class SynonymAnalyzer extends Analyzer { private WordNetSynonymEngine engine; private Set stopword; public SynonymAnalyzer(String [] word) { try{ engine = new WordNetSynonymEngine(new File("C:\\PDF2Text\\SearchEngine\\WordNetIndex")); stopword = StopFilter.makeStopSet(word); }catch(IOException e){ e.printStackTrace(); } } public TokenStream tokenStream(String fieldName, Reader reader){ TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result); result = new LowerCaseFilter(result); if (stopword != null){ result = new StopFilter(result, stopword); } result = new SnowballFilter(result, "Lovins"); result = new SynonymFilter(result, engine); return result; } } I write some code in the snowballfitler (line 75-79). If i only used the snowballfilter, the term "support" can be found in all the 17 documents. However, if the code "result = new SynonymFilter(result, engine);" is used. The term "support" cannot be found in some documents. public class SnowballFilter extends TokenFilter { private static final Object [] EMPTY_ARGS = new Object[0]; private SnowballProgram stemmer; private Method stemMethod; /** Construct the named stemming filter. * * @param in the input tokens to stem * @param name the name of a stemmer */ public SnowballFilter(TokenStream in, String name) { super(in); try { Class stemClass = Class.forName("net.sf.snowball.ext." + name + "Stemmer"); stemmer = (SnowballProgram) stemClass.newInstance(); // why doesn't the SnowballProgram class have an (abstract?) stem method? stemMethod = stemClass.getMethod("stem", new Class[0]); } catch (Exception e) { throw new RuntimeException(e.toString()); } } /** Returns the next input Token, after being stemmed */ public final Token next() throws IOException { Token token = input.next(); if (token == null) return null; stemmer.setCurrent(token.termText()); try { stemMethod.invoke(stemmer, EMPTY_ARGS); } catch (Exception e) { throw new RuntimeException(e.toString()); } Token newToken = new Token(stemmer.getCurrent(), token.startOffset(), token.endOffset(), token.type()); //check the tokens. if(newToken.termText().equals("support")){ System.out.println("the term support is found"); } newToken.setPositionIncrement(token.getPositionIncrement()); return newToken; } } On 1/16/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > Could you share the details of your SynonymFilter? Is it adding > tokens into the same position as the original tokens (position > increment of 0)? Are you using QueryParser for searching? If so, > try TermQuery to eliminate the parser's analysis from the picture for > the time being while trouble shooting. > > If you are using QueryParser, are you using the same analyzer? If > this is the case, what is the .toString of the generated Query? > > Erik > > > On Jan 16, 2006, at 3:54 AM, jason wrote: > > > Hi, > > > > I got a problem of using the lucene. > > > > I write a SynonymFilter which can add synonyms from the WordNet. > > Meanwhile, > > i used the SnowballFilter for term stemming. However, i got a > > problem when > > combining the two fiters. > > > > For instance, i got 17 documents containing the Term "support" > > and the > > following is the SynonymAnalyzer i wrote. > > > > /** > > * > > */ > > public TokenStream tokenStream(String fieldName, Reader reader){ > > > > > > TokenStream result = new StandardTokenizer(reader); > > result = new StandardFilter(result); > > result = new LowerCaseFilter(result); > > if (stopword != null){ > > result = new StopFilter(result, stopword); > > } > > > > result = new SnowballFilter(result, "Lovins"); > > > > result = new SynonymFilter(result, engine); > > > > return result; > > } > > > > If i only used the SnowballFilter, i can find the "support" in the 17 > > documents. However, after adding the SynonymFilter, the "support" > > can only > > be found in 10 documents. It seems the term "support" cannot be > > found in the > > left 7 documents. I dont know what's wrong with it. > > > > regards > > > > jiang xing > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >