thanks !!!!!

On Thu, Jul 18, 2013 at 5:30 PM, Allison, Timothy B. <talli...@mitre.org> wrote:
> Need to set outputUnigrams = false with something like:
>
>       StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43, 
> reader);
>       TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);
>       tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);
>
>       TokenFilter sf = new ShingleFilter(tokenStream, 3,3);
>       ((ShingleFilter)sf).setOutputUnigrams(false);
>
>       sf = new 
> StopFilter(Version.LUCENE_43,sf,StopAnalyzer.ENGLISH_STOP_WORDS_SET);
>
>       return new Analyzer.TokenStreamComponents(source, sf);
>
>
> Not sure the stopFilter will do you any good if you're extracting only 
> trigrams.
> -----Original Message-----
> From: murba...@rams.colostate.edu [mailto:murba...@rams.colostate.edu] On 
> Behalf Of Malgorzata Urbanska
> Sent: Thursday, July 18, 2013 6:02 PM
> To: java-user@lucene.apache.org
> Subject: ShingleFilter
>
> Hello,
>
> For some time I have been trying to apply ShingleFilter. I have a string:
> "The users get program in the User RPC API in Apache Rave"
>
> and I would like to get:
>
> [the users get]  [users get program]  [get program in] [program in
> the] [in the user] [the user rpc] [user rpc api] [rpc api in] [api in
> apache] [in apache rave][apache rave 0.11]
>
> however I'm getting :
>
> [the users get] [users] [users get program] [get] [get program in]
> [program] [program in the] [in the user] [the user rpc] [user] [user
> rpc api] [rpc] [rpc api in] [api] [api in apache] [in apache rave]
> [apache] [apache rave 0.11] [rave]
>
> part of my code:
>
> protected TokenStreamComponents createComponents(String fieldName,
> Reader reader){
>
>
>         StandardTokenizer source = new
> StandardTokenizer(Version.LUCENE_43, reader);
>
>         TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, 
> source);
>
>         tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);
>
>         tokenStream = new ShingleFilter(tokenStream,3,3);
>
>         tokenStream = new
> StopFilter(Version.LUCENE_43,tokenStream,StopAnalyzer.ENGLISH_STOP_WORDS_SET);
>
>
>         return new TokenStreamComponents(source, tokenStream)
>
> could please, somebody explain me why I'm getting single shinglers
> when I set min size 3.
> Thanks,
> --
> gosia
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>



-- 
Malgorzata Urbanska (Gosia)
Graduate Assistant
Colorado State University

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to