(solr) branch branch_9x updated: Add info about the Spanish Plural Stemmer (#3040)

epugh Mon, 20 Jan 2025 08:01:00 -0800

This is an automated email from the ASF dual-hosted git repository.

epugh pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git



The following commit(s) were added to refs/heads/branch_9x by this push:
     new d1bda56c431 Add info about the Spanish Plural Stemmer (#3040)
d1bda56c431 is described below

commit d1bda56c43180aa89c9a9c6423e370b3687c00f2
Author: Corrado Fiore <[email protected]>
AuthorDate: Mon Jan 20 23:40:29 2025 +0800

    Add info about the Spanish Plural Stemmer (#3040)
    
    (cherry picked from commit 83141bbb099efd5cc5126267f8d5e52510a57d58)
---
 .../indexing-guide/pages/language-analysis.adoc    | 41 ++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git 
a/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc 
b/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
index cad5782584a..91bd8c3a678 100644
--- a/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
+++ b/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
@@ -3223,14 +3223,15 @@ With class name (legacy)::
 
 === Spanish
 
-Solr includes two stemmers for Spanish: one in the 
`solr.SnowballPorterFilterFactory language="Spanish"`, and a lighter stemmer 
called `solr.SpanishLightStemFilterFactory`.
+Solr includes three stemmers for Spanish: the 
`solr.SnowballPorterFilterFactory language="Spanish"`, a lighter stemmer called 
`solr.SpanishLightStemFilterFactory` and a plural stemmer called 
`solr.SpanishPluralStemFilter` 
(https://mices.co/mices2021/slides/Xavier-Sanchez_Spanish-Stemmers-Solr.pdf[slides],
 
https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373[article])
 that implements the rules described in http:/ [...]
+
 Lucene includes an example stopword list.
 
 *Factory class:* `solr.SpanishStemFilterFactory`
 
 *Arguments:* None
 
-*Example:*
+*Example 1:*
 
 [tabs#lang-spanish]
 ======
@@ -3267,6 +3268,42 @@ With class name (legacy)::
 
 *Out:* "tor", "tor", "tor"
 
+*Example 2:*
+
+[tabs#lang-spanish]
+======
+With name::
++
+====
+[source,xml]
+----
+<analyzer>
+  <tokenizer name="standard"/>
+  <filter name="lowercase"/>
+  <filter name="spanishPluralStem"/>
+</analyzer>
+----
+====
+
+With class name (legacy)::
++
+====
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.StandardTokenizerFactory"/>
+  <filter class="solr.LowerCaseFilterFactory"/>
+  <filter class="solr.SpanishPluralStemFilterFactory"/>
+</analyzer>
+----
+====
+======
+
+*In:* "ases esprais paces bits amigos cantar caries"
+
+*Tokenizer to Filter:* "ases", "esprais", "paces", "bits", "amigos", "cantar", 
"caries"
+
+*Out:* "as", "espray", "paz", "bit", "amigo", "cantar", "caries"
 
 === Swedish

(solr) branch branch_9x updated: Add info about the Spanish Plural Stemmer (#3040)

Reply via email to