On 8/18/11 12:38 PM, Olivier Grisel wrote:
True but working on a generic API adapter would make it possible to
benefit from the huge set of existing tokenizers / analyzers from the
Lucene community. Although I am aware that most of the time lucene
analyzers drop the punctuation information which is mostly useless for
Information Retrieval but often critical for NLP.

As far as I know is Lucene redistributing the snowball stemmers,
that would could also be an option for us, then we directly have
stemmers for all languages we currently support.

I do not really see a benefit for adapting Lucene analyzers,
if someone wants to use a Lcuene tokenizer instead of an OpenNLP
one he can simply do that, and then provide the
tokenized text to OpenNLP. That is already supported.

Jörn

Reply via email to