Re: Stop words filter

Rebecca Watson Tue, 22 Jun 2010 20:21:11 -0700

i guess you are using lucene 2.9 or below if you're talking about
Tokens still...


here's some old code i used to use (not sure if i wrote it or grabbed it from
online examples - its been a while since i used it!)
that grabbed the set of tokens given field name +
text to analyse (for any class that extended it.... e.g. use it for
per field analyzer
too):

public abstract class GenAnalyzer extends Analyzer {
        
        /**
         * lucene Analyzer object
         * @see org.apache.lucene.analysis.Analyzer
         */
        protected Analyzer gan;
        
        /*
         * A method to split text into tokens which are returned in the form of
         * a TokenStream object. The text is read in using the java.io.Reader
         * object. As analysers can be field specific the name of the field
         * is also provided to the method.
         *
         * @see 
org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
java.io.Reader)
         * @param fieldName the name of the lucene field
         * @param reader A Reader object containing string to split into tokens
         * @return a TokenStream that represents the string split into tokens
based on the _
         * field name (maybe field specific analyser).
         */
        @Override
        public TokenStream tokenStream(String fieldName, Reader reader) {
                return gan.tokenStream(fieldName, reader);
        }
        
        /**
         * A method to split text into tokens which are returned in the form of
         * a Token[]. The text is read in as a string.
         * As analysers can be field specific the name of the field
         * is also provided to the method.
         *
         * similar to tokenStream method accept that the parameters
         * and return type differ.
         *
         * @param fieldName the name of the lucene field
         * @param text the text to be split into tokens
         * @return a Token[] which represents the split text tokens.
         * @throws IOException maybe thrown by stream.next(token) call.
         *
         * @see org.apache.lucene.analysis.Token
         */
        public Token[] getTokens(String fieldName, String text)
        throws IOException {
                TokenStream stream = gan.tokenStream(fieldName, new 
StringReader(text));
                ArrayList<Token> tokenList = new ArrayList<Token>();
                Token token = new Token();
                while(true){
                        token = stream.next(token);
                        if (token == null) break;
                        tokenList.add((Token) token.clone());
                }
                //stream.end();
                return tokenList.toArray(new Token[0]);
        }
}

hope that helps, i haven't used this code for a while but it worked
when i used it last!

in lucene 2.9 the stream.next(token) method is deprecated... and
if you move to lucene 3 i think that's where the attributesources replace tokens
so all this code will need to be ported...

thanks :)

bec

On 23 June 2010 10:49, Vinicius Carvalho <viniciusccarva...@gmail.com> wrote:
> Hello there! I've been using lucene as a Fult Text Search solution for some
> time. And  although I'm familiar with Analyzers and Stemmers I never used
> them directly.
>
> I'm testing a few experiments on Sentiment Analysis and our implementation
> needs to perform stemming and stop word removal. I thought using lucene
> built-in support to spare me some coding time.
>
> Is there any example? I'm trying
>
> TokenStream stream = analyzer.tokenStream("", new StringReader(inputStr));
>
> Problem is that I could not find a way to get the result tokens. I was
> expecting something like stream.getTokens:Token[] :P
>
> Could someone point me in the right direction?
>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Stop words filter

Reply via email to