On 8/18/11 12:24 PM, Olivier Grisel wrote:
Is this better or cover more languages than what's already provided by Apache Lucene? Maybe it should better be contributed to the Lucene project and make it easy to use the generic, battle tested Lucene analyzers / tokenizers infrastructure to generate features in OpenNLP.
The OpenNLP APIs are all not designed to work on token streams, instead a user usually has to provide an entire sentence at once, so that does not make a nice fit. And since we are an NLP library I believe it is absolutly fine to implement our own stemming here. Jörn
