On 07/15/2013 07:50 PM, Malgorzata Urbanska wrote: > Hi, > > I've been trying to figure out how to use ngrams in Lucene 4.3.0 > I found some examples for earlier version but I'm still confused. > How I understand it, I should: > 1. create a new analyzer which uses ngrams > 2. apply it to my indexer > 3. search using the same analyzer > > I found in a documentation: NGramTokenFilter and NGramTokenizer, but I > do not understand what is the difference between them. This should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Tokenizers
Here is example of n-gram analyzer: public class NGramAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer src = new NGramTokenizer(reader, 3, 3); TokenStream tok = new StandardFilter(Version.LUCENE_43, src); tok = new LowerCaseFilter(Version.LUCENE_43, tok); return new TokenStreamComponents(src, tok) { @Override protected void setReader(final Reader reader) throws IOException { super.setReader(reader); } }; } } If, for example, you want to remove stop words from document before breaking it into n-grams, than you would need: reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter Regards, Ivan Krišto --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org