i guess you are using lucene 2.9 or below if you're talking about
Tokens still...
here's some old code i used to use (not sure if i wrote it or grabbed it from
online examples - its been a while since i used it!)
that grabbed the set of tokens given field name +
text to analyse (for any class that extended it.... e.g. use it for
per field analyzer
too):
public abstract class GenAnalyzer extends Analyzer {
/**
* lucene Analyzer object
* @see org.apache.lucene.analysis.Analyzer
*/
protected Analyzer gan;
/*
* A method to split text into tokens which are returned in the form of
* a TokenStream object. The text is read in using the java.io.Reader
* object. As analysers can be field specific the name of the field
* is also provided to the method.
*
* @see
org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
java.io.Reader)
* @param fieldName the name of the lucene field
* @param reader A Reader object containing string to split into tokens
* @return a TokenStream that represents the string split into tokens
based on the _
* field name (maybe field specific analyser).
*/
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return gan.tokenStream(fieldName, reader);
}
/**
* A method to split text into tokens which are returned in the form of
* a Token[]. The text is read in as a string.
* As analysers can be field specific the name of the field
* is also provided to the method.
*
* similar to tokenStream method accept that the parameters
* and return type differ.
*
* @param fieldName the name of the lucene field
* @param text the text to be split into tokens
* @return a Token[] which represents the split text tokens.
* @throws IOException maybe thrown by stream.next(token) call.
*
* @see org.apache.lucene.analysis.Token
*/
public Token[] getTokens(String fieldName, String text)
throws IOException {
TokenStream stream = gan.tokenStream(fieldName, new
StringReader(text));
ArrayList<Token> tokenList = new ArrayList<Token>();
Token token = new Token();
while(true){
token = stream.next(token);
if (token == null) break;
tokenList.add((Token) token.clone());
}
//stream.end();
return tokenList.toArray(new Token[0]);
}
}
hope that helps, i haven't used this code for a while but it worked
when i used it last!
in lucene 2.9 the stream.next(token) method is deprecated... and
if you move to lucene 3 i think that's where the attributesources replace tokens
so all this code will need to be ported...
thanks :)
bec
On 23 June 2010 10:49, Vinicius Carvalho <[email protected]> wrote:
> Hello there! I've been using lucene as a Fult Text Search solution for some
> time. And although I'm familiar with Analyzers and Stemmers I never used
> them directly.
>
> I'm testing a few experiments on Sentiment Analysis and our implementation
> needs to perform stemming and stop word removal. I thought using lucene
> built-in support to spare me some coding time.
>
> Is there any example? I'm trying
>
> TokenStream stream = analyzer.tokenStream("", new StringReader(inputStr));
>
> Problem is that I could not find a way to get the result tokens. I was
> expecting something like stream.getTokens:Token[] :P
>
> Could someone point me in the right direction?
>
> Regards
>
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]