Adding stemmers would be nice, and it could be a fairly easy path to
bringing in new developers since it is pretty much independent of other
components and easy to test.

However, I would also note that it would be great to get real morphological
analysis in there. There is a lot of recent interest in the NLP research
community toward learning morphological analyzers, and perhaps that can
eventually make its way into OpenNLP.

Jason

On Thu, Aug 18, 2011 at 5:52 AM, Jörn Kottmann <[email protected]> wrote:

> On 8/18/11 12:38 PM, Olivier Grisel wrote:
>
>> True but working on a generic API adapter would make it possible to
>> benefit from the huge set of existing tokenizers / analyzers from the
>> Lucene community. Although I am aware that most of the time lucene
>> analyzers drop the punctuation information which is mostly useless for
>> Information Retrieval but often critical for NLP.
>>
>
> As far as I know is Lucene redistributing the snowball stemmers,
> that would could also be an option for us, then we directly have
> stemmers for all languages we currently support.
>
> I do not really see a benefit for adapting Lucene analyzers,
> if someone wants to use a Lcuene tokenizer instead of an OpenNLP
> one he can simply do that, and then provide the
> tokenized text to OpenNLP. That is already supported.
>
> Jörn
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to