[
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195820#comment-13195820
]
Jan Høydahl edited comment on SOLR-2764 at 1/29/12 10:09 PM:
-------------------------------------------------------------
Thanks Christian. I further refined stuff:
- I think the MinimalStemmer is more or less good to go, it seems to do what
it's supposed to
- For LightStemmer, we now do "two-pass" removal for the -dom and -het endings.
This means that the word "kristendom" will first be stemmed to "kristen", and
then all the general rules apply so it will be further stemmed to "krist". The
effect of this is that both "kristen,kristendom,kristendommen,kristendommens"
will all be stemmed to "krist" (due to in this case incorrect interpretation of
-en as singular definite ending).
- Added some more tests to highlight this
What do you think, is this -dom -het thing a reasonable improvement or could
there be side effects?
Are there some other general rules that could easily be incorporated to catch
semi-regular conjugations for the light stemmer?
was (Author: janhoy):
Thanks Christian. I further refined stuff:
- For MinimalStemmer, we now do two-pass removal for the -dom and -het endings.
This means that the word kristendom will first be stemmed to kristen, and then
all the general rules apply so it will be further stemmed to krist. The effect
of this is that both "kristen,kristendom,kristendommen,kristendommens" will all
be stemmed to "krist" (due to in this case incorrect interpretation of -en as
plural ending), but when stopping at -dom removal, kristendom would not match
inflections of kristen.
What do you think, is this a reasonable improvement or could there be side
effects? I've not added these rules to the MinimalStemmer, to keep it simpler.
> Create a NorwegianLightStemmer and NorwegianMinimalStemmer
> ----------------------------------------------------------
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch,
> SOLR-2764.patch
>
>
> We need a simple light-weight stemmer and a minimal stemmer for
> plural/singlular only in Norwegian
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]