Re: Stop Words in SpellCheckComponent
> Also, generally, you should have a separate field and field type for the > spellcheck field **so that normal text fields can use stop words.** Now I've found a solution, although I'm not sure, if it's that what you've meant: Now I'm using a special fieldType WITHOUT stopwords for the spellcheck field. So - I think - the SpellCheckComponent doesn't find better matches for stopwords, because it has indexed the stopwords itself. Thanks for your help Matthias schema.xml . solrconfig.xml . textSpell default spellcheckField
Re: Stop Words in SpellCheckComponent
You forgot to give us the field definition for "name". Is it the same as in the 3.6 example, or is it changed? Make sure that you delete all existing data after you change the schema/config. Do a direct query on the spellcheck field (name:the) to verify whether "the" is being indexed or not. Also, generally, you should have a separate field and field type for the spellcheck field so that normal text fields can use stop words. -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Friday, June 01, 2012 4:51 AM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent But your most recent email referred to "stopword.txt". So, either add "the" to german_stop_long.txt, or change the "words" option of your stopfilter to refer to "stopwords.txt". Sorry for that confusion: The stopfilter refers to the stopwords.txt Now I'm just talking about the solr example webapp (apache-solr-3.6.0.tgz/example) which I slightly modified (as described in the last mail). In this example solr makes also suggestions for stopwords. I can't see a mistake in my configuration. 1. The stopfilter refers to the stopwords.txt: ... ... ... ... 2. The SpellCheckComponent refers to the field "name": name
Re: Stop Words in SpellCheckComponent
> But your most recent email referred to "stopword.txt". > > So, either add "the" to german_stop_long.txt, or change the "words" option > of your stopfilter to refer to "stopwords.txt". Sorry for that confusion: The stopfilter refers to the stopwords.txt Now I'm just talking about the solr example webapp (apache-solr-3.6.0.tgz/example) which I slightly modified (as described in the last mail). In this example solr makes also suggestions for stopwords. I can't see a mistake in my configuration. 1. The stopfilter refers to the stopwords.txt: ... ... ... ... 2. The SpellCheckComponent refers to the field "name": name
Re: Stop Words in SpellCheckComponent
Your earlier email had this option in your spellcheck.de field type analyzer for the StopFilterFactory: words="german_stop_long.txt" But your most recent email referred to "stopword.txt". So, either add "the" to german_stop_long.txt, or change the "words" option of your stopfilter to refer to "stopwords.txt". BTW, I think you can actually have a comma-separated list of stopword files, so you can write: words="german_stop_long.txt,stopwords.txt" -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Friday, June 01, 2012 1:44 AM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent spellcheck_de That should reference a field, not a field type. Thanks for your help. But I did that, too. Here I'll show that even the solr example webapp makes suggestions for stopwords: I've ... 1. added "the" to the stopwords.txt 2. added "thex" to an example document (field name) 3. startet solr 4. indexed the example files (sh post.sh *.xml) 5. searched for "the solr" http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json 6. got the desired result, but also the wrong suggestion "thex" { "response" : { "docs" : [ {... "name" : "Solr, thex Enterprise Search Server", .. } ], "numFound" : 1, ... }, ... "spellcheck" : { "suggestions" : [ "the", {..."suggestion" : [ "thex" ] } ] } } Here's the complete diff between the original download and my 3 modifications: diff -r apache-solr-3.6.0/example/exampledocs/solr.xml apache-solr-3.6.0x/example/exampledocs/solr.xml 21c21 < Solr, the Enterprise Search Server --- Solr, thex Enterprise Search Server diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml apache-solr-3.6.0x/example/solr/conf/solrconfig.xml 781a782,785 spellcheck 1122a1127 true diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt apache-solr-3.6.0x/example/solr/conf/stopwords.txt 14a15,16 the
Re: Stop Words in SpellCheckComponent
> spellcheck_de > > That should reference a field, not a field type. Thanks for your help. But I did that, too. Here I'll show that even the solr example webapp makes suggestions for stopwords: I've ... 1. added "the" to the stopwords.txt 2. added "thex" to an example document (field name) 3. startet solr 4. indexed the example files (sh post.sh *.xml) 5. searched for "the solr" http://myhost:8983/solr/select?q=the+solr&spellcheck=true&wt=json 6. got the desired result, but also the wrong suggestion "thex" { "response" : { "docs" : [ {... "name" : "Solr, thex Enterprise Search Server", .. } ], "numFound" : 1, ... }, ... "spellcheck" : { "suggestions" : [ "the", {..."suggestion" : [ "thex" ] } ] } } Here's the complete diff between the original download and my 3 modifications: diff -r apache-solr-3.6.0/example/exampledocs/solr.xml apache-solr-3.6.0x/example/exampledocs/solr.xml 21c21 < Solr, the Enterprise Search Server --- > Solr, thex Enterprise Search Server diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml apache-solr-3.6.0x/example/solr/conf/solrconfig.xml 781a782,785 > >spellcheck > > 1122a1127 > true diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt apache-solr-3.6.0x/example/solr/conf/stopwords.txt 14a15,16 > > the
Re: Stop Words in SpellCheckComponent
Spellcheck wants a field, not a field type. You have a spellcheck_de field type, but you need a field as well. spellcheck_de That should reference a field, not a field type. -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Thursday, May 31, 2012 3:23 PM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent is it possible to configure a stopword list to the SpellCheckComponent? Add a stopwordfilter to your spellcheck field. Hmm, I did. Could it be another mistake? This is the schema definition: This is the solrconfig: edismax 10 text_de title_de^5 text_de title_de^5 true 0 spellcheck_de textSpell default spellcheck_de spellchecker_de true true
Re: Stop Words in SpellCheckComponent
>> is it possible to configure a stopword list to the SpellCheckComponent? > Add a stopwordfilter to your spellcheck field. Hmm, I did. Could it be another mistake? This is the schema definition: This is the solrconfig: edismax 10 text_de title_de^5 text_de title_de^5 true 0 spellcheck_de textSpell default spellcheck_de spellchecker_de true true
RE: Stop Words in SpellCheckComponent
Add a stopwordfilter to your spellcheck field. -Original message- > From:Matthias Müller > Sent: Thu 31-May-2012 18:39 > To: solr-user@lucene.apache.org > Subject: Stop Words in SpellCheckComponent > > Hi, > > is it possible to configure a stopword list to the SpellCheckComponent? > > For example: > When searching for "the indexs" "the" is filtered, because it is a stopword. > The SpellCheckComponent gives me a false suggestion for "the". > But the SpellCheckComponent should only give a suggestion for "index" > because "the" is a stopword. > > Kind Regards > > Matthias >