thanks !!!!! On Thu, Jul 18, 2013 at 5:30 PM, Allison, Timothy B. <talli...@mitre.org> wrote: > Need to set outputUnigrams = false with something like: > > StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43, > reader); > TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source); > tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream); > > TokenFilter sf = new ShingleFilter(tokenStream, 3,3); > ((ShingleFilter)sf).setOutputUnigrams(false); > > sf = new > StopFilter(Version.LUCENE_43,sf,StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > return new Analyzer.TokenStreamComponents(source, sf); > > > Not sure the stopFilter will do you any good if you're extracting only > trigrams. > -----Original Message----- > From: murba...@rams.colostate.edu [mailto:murba...@rams.colostate.edu] On > Behalf Of Malgorzata Urbanska > Sent: Thursday, July 18, 2013 6:02 PM > To: java-user@lucene.apache.org > Subject: ShingleFilter > > Hello, > > For some time I have been trying to apply ShingleFilter. I have a string: > "The users get program in the User RPC API in Apache Rave" > > and I would like to get: > > [the users get] [users get program] [get program in] [program in > the] [in the user] [the user rpc] [user rpc api] [rpc api in] [api in > apache] [in apache rave][apache rave 0.11] > > however I'm getting : > > [the users get] [users] [users get program] [get] [get program in] > [program] [program in the] [in the user] [the user rpc] [user] [user > rpc api] [rpc] [rpc api in] [api] [api in apache] [in apache rave] > [apache] [apache rave 0.11] [rave] > > part of my code: > > protected TokenStreamComponents createComponents(String fieldName, > Reader reader){ > > > StandardTokenizer source = new > StandardTokenizer(Version.LUCENE_43, reader); > > TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, > source); > > tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream); > > tokenStream = new ShingleFilter(tokenStream,3,3); > > tokenStream = new > StopFilter(Version.LUCENE_43,tokenStream,StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > > return new TokenStreamComponents(source, tokenStream) > > could please, somebody explain me why I'm getting single shinglers > when I set min size 3. > Thanks, > -- > gosia > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
-- Malgorzata Urbanska (Gosia) Graduate Assistant Colorado State University --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org