Re: Lemmatizer for Solr

2020-02-14 Thread Nicolas Franck
Try also looking at the HunspellFilter:

https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html

dictionaries ( .dic and .aff ) can be found here:

https://cgit.freedesktop.org/libreoffice/dictionaries

or via the git repo:

https://anongit.freedesktop.org/git/libreoffice/dictionaries.git

It is a spellingstool actually that works by applying rules (from the affix)
to each individual token until it finds a word in the dictionary.
And luckily there are a lot of dictionaries (from libreoffice)

The opennlp looked promising, but - as with hunspell - the quality
depends on the dictionary, and I could not find any dictionary
beyond the English ones (anyone):

http://opennlp.sourceforge.net/models-1.5/
https://github.com/richardwilly98/elasticsearch-opennlp-auto-tagging/tree/master/src/main/resources/models

I guess that was the only thing you were looking for?
I would use this one, if it wasn't for the lack of other dictionaries,
as it does thorough inspecting of the semantic context before
trying to match any word (hunspell determines this without
knowing any context, due to the way it is called).

On 14 Feb 2020, at 21:21, Shamik Bandopadhyay 
mailto:sham...@gmail.com>> wrote:

Hi,
 I'm trying to replace pprter stemmer with an english lemmatizer in my
analysis chain. Just wondering what
is the recommended way of achieving this. I've come across few different
implementation which are listed below;

Open NLP -->
https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-
lemmatizer-filter

https://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.lemmatizer

KStem Filter -->
https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#kstem-filter

There are couple of third party libraries , but not sure if they are being
maintained or compatible with the solr version i'm using (7.5).

https://github.com/nicholasding/solr-lemmatizer
https://github.com/bejean/solr-lemmatizer

Currently, I'm looking for English only lemmatization. Also, I need to have
the ability to update the lemma dictionary to add custom terms specific to
our organization (not sure of kstem filter can do that).

Any pointers will be appreciated.

Regards,
Shamik



Lemmatizer for Solr

2020-02-14 Thread Shamik Bandopadhyay
Hi,
  I'm trying to replace pprter stemmer with an english lemmatizer in my
analysis chain. Just wondering what
is the recommended way of achieving this. I've come across few different
implementation which are listed below;

Open NLP -->
https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-
lemmatizer-filter

https://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.lemmatizer

KStem Filter -->
https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#kstem-filter

There are couple of third party libraries , but not sure if they are being
maintained or compatible with the solr version i'm using (7.5).

https://github.com/nicholasding/solr-lemmatizer
https://github.com/bejean/solr-lemmatizer

Currently, I'm looking for English only lemmatization. Also, I need to have
the ability to update the lemma dictionary to add custom terms specific to
our organization (not sure of kstem filter can do that).

Any pointers will be appreciated.

Regards,
Shamik