On 8/18/11 12:24 PM, Olivier Grisel wrote:
Is this better or cover more languages than what's already provided by
Apache Lucene? Maybe it should better be contributed to the Lucene
project and make it easy to use the generic, battle tested Lucene
analyzers / tokenizers infrastructure to generate features in OpenNLP.

The OpenNLP APIs are all not designed to work on token streams, instead
a user usually has to provide an entire sentence at once, so that does not
make a nice fit.

And since we are an NLP library I believe it is absolutly fine to implement
our own stemming here.

Jörn

Reply via email to