I'm using Lucene 4.10.4 and trying to construct (shingles) combinations of
tokens.
Code:
public class CustomAnalyzer extends Analyzer {
@Override
protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader) {
final WhitespaceTokenizer src = new
WhitespaceTokenizer(getVersion(), reader);
TokenStream tok = new ShingleFilter(src, 2, 3);
tok = new ClassicFilter(tok);
tok = new LowerCaseFilter(tok);
// tok = new
SynonymFilter(tok,SynonymDictionary.getSynonymMap(),true);
return new Analyzer.TokenStreamComponents(src, tok);
}
}
public class Test {
public static void main(String[] args) throws Exception {
CustomSynonymAnalyzer analyzer = new CustomSynonymAnalyzer();
String queryStr = "cup board";
TokenStream ts = new CustomAnalyzer().tokenStream("n", new
StringReader(queryStr));
ts.reset();
System.out.println("Tokens are :");
while (ts.incrementToken()) {
System.out.print(ts.getAttribute(CharTermAttribute.class) + ",
");
}
QueryParser parser = new QueryParser("n", analyzer);
Query query = null;
query = parser.parse(queryStr);
System.out.println("\nQuery is");
System.out.print(query.toString());
}
}
> Output:
> Tokens are :
> cup, cup board, board
> Query is n
> n:cup n:board
>
Tokens are printed as expected. And expecting the resulting query to be *n:cup
n:board n:cup board*. But tokens formed by shingle filter are not appended
to the query. I get only *n:cup n:board.* Where is my mistake?
Thanks.