Try also looking at the HunspellFilter:
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html
dictionaries ( .dic and .aff ) can be found here:
https://cgit.freedesktop.org/libreoffice/dictionaries
or via the git repo:
https://anongit.freedesktop.org/git/libreoffice/dictionaries.git
It is a spellingstool actually that works by applying rules (from the affix)
to each individual token until it finds a word in the dictionary.
And luckily there are a lot of dictionaries (from libreoffice)
The opennlp looked promising, but - as with hunspell - the quality
depends on the dictionary, and I could not find any dictionary
beyond the English ones (anyone):
http://opennlp.sourceforge.net/models-1.5/
https://github.com/richardwilly98/elasticsearch-opennlp-auto-tagging/tree/master/src/main/resources/models
I guess that was the only thing you were looking for?
I would use this one, if it wasn't for the lack of other dictionaries,
as it does thorough inspecting of the semantic context before
trying to match any word (hunspell determines this without
knowing any context, due to the way it is called).
On 14 Feb 2020, at 21:21, Shamik Bandopadhyay
mailto:sham...@gmail.com>> wrote:
Hi,
I'm trying to replace pprter stemmer with an english lemmatizer in my
analysis chain. Just wondering what
is the recommended way of achieving this. I've come across few different
implementation which are listed below;
Open NLP -->
https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-
lemmatizer-filter
https://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.lemmatizer
KStem Filter -->
https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#kstem-filter
There are couple of third party libraries , but not sure if they are being
maintained or compatible with the solr version i'm using (7.5).
https://github.com/nicholasding/solr-lemmatizer
https://github.com/bejean/solr-lemmatizer
Currently, I'm looking for English only lemmatization. Also, I need to have
the ability to update the lemma dictionary to add custom terms specific to
our organization (not sure of kstem filter can do that).
Any pointers will be appreciated.
Regards,
Shamik