On Fri, Feb 12, 2021 at 7:05 AM Peter Gromov
<[email protected]> wrote:

>
> Robert, for n=20 the speedup is quite small, 2-8% for me depending on the
> language. Unfortunately Hunspell dictionaries don't have stop word
> information, it'd be quite useful.
>
>
OK, maybe with a cache size that small it won't cache the stopwords, I
don't know. Was just mentioning it on the side. We do have stopword
information for a lot of languages as resource files in lucene:

https://github.com/apache/lucene-solr/tree/master/lucene/analysis/common/src/resources/org/apache/lucene/analysis

Some users will remove them before they get to the hunspell, some users
won't.

But we also have a way in the analysis chain to override stemming for
particular words. It stems them the way you want and then sets a marker so
that Hunspell wouldn't even be called on them:

https://lucene.apache.org/core/8_8_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html

So if the user really wants to keep the stopwords, they could put this
"thing" in front of it to prevent them from slowing stuff down.

Reply via email to