Re: Stop Words in SpellCheckComponent
Also, generally, you should have a separate field and field type for the spellcheck field **so that normal text fields can use stop words.** Now I've found a solution, although I'm not sure, if it's that what you've meant: Now I'm using a special fieldType WITHOUT stopwords for the spellcheck field. So - I think - the SpellCheckComponent doesn't find better matches for stopwords, because it has indexed the stopwords itself. Thanks for your help Matthias schema.xml . fieldType name=spellcheckType class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=spellcheckField type=spellcheckType indexed=true stored=false/ solrconfig.xml . searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellcheckField/str
Re: Stop Words in SpellCheckComponent
Your earlier email had this option in your spellcheck.de field type analyzer for the StopFilterFactory: words=german_stop_long.txt But your most recent email referred to stopword.txt. So, either add the to german_stop_long.txt, or change the words option of your stopfilter to refer to stopwords.txt. BTW, I think you can actually have a comma-separated list of stopword files, so you can write: words=german_stop_long.txt,stopwords.txt -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Friday, June 01, 2012 1:44 AM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent str name=fieldspellcheck_de/str That should reference a field, not a field type. Thanks for your help. But I did that, too. Here I'll show that even the solr example webapp makes suggestions for stopwords: I've ... 1. added the to the stopwords.txt 2. added thex to an example document (field name) 3. startet solr 4. indexed the example files (sh post.sh *.xml) 5. searched for the solr http://myhost:8983/solr/select?q=the+solrspellcheck=truewt=json 6. got the desired result, but also the wrong suggestion thex { response : { docs : [ {... name : Solr, thex Enterprise Search Server, .. } ], numFound : 1, ... }, ... spellcheck : { suggestions : [ the, {...suggestion : [ thex ] } ] } } Here's the complete diff between the original download and my 3 modifications: diff -r apache-solr-3.6.0/example/exampledocs/solr.xml apache-solr-3.6.0x/example/exampledocs/solr.xml 21c21 field name=nameSolr, the Enterprise Search Server/field --- field name=nameSolr, thex Enterprise Search Server/field diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml apache-solr-3.6.0x/example/solr/conf/solrconfig.xml 781a782,785 arr name=last-components strspellcheck/str /arr 1122a1127 str name=buildOnCommittrue/str diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt apache-solr-3.6.0x/example/solr/conf/stopwords.txt 14a15,16 the
Re: Stop Words in SpellCheckComponent
But your most recent email referred to stopword.txt. So, either add the to german_stop_long.txt, or change the words option of your stopfilter to refer to stopwords.txt. Sorry for that confusion: The stopfilter refers to the stopwords.txt Now I'm just talking about the solr example webapp (apache-solr-3.6.0.tgz/example) which I slightly modified (as described in the last mail). In this example solr makes also suggestions for stopwords. I can't see a mistake in my configuration. 1. The stopfilter refers to the stopwords.txt: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index ... filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / ... /analyzer analyzer type=query ... filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / ... /analyzer /fieldType 2. The SpellCheckComponent refers to the field name: str name=fieldname/str
Re: Stop Words in SpellCheckComponent
You forgot to give us the field definition for name. Is it the same as in the 3.6 example, or is it changed? Make sure that you delete all existing data after you change the schema/config. Do a direct query on the spellcheck field (name:the) to verify whether the is being indexed or not. Also, generally, you should have a separate field and field type for the spellcheck field so that normal text fields can use stop words. -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Friday, June 01, 2012 4:51 AM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent But your most recent email referred to stopword.txt. So, either add the to german_stop_long.txt, or change the words option of your stopfilter to refer to stopwords.txt. Sorry for that confusion: The stopfilter refers to the stopwords.txt Now I'm just talking about the solr example webapp (apache-solr-3.6.0.tgz/example) which I slightly modified (as described in the last mail). In this example solr makes also suggestions for stopwords. I can't see a mistake in my configuration. 1. The stopfilter refers to the stopwords.txt: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index ... filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / ... /analyzer analyzer type=query ... filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / ... /analyzer /fieldType 2. The SpellCheckComponent refers to the field name: str name=fieldname/str
RE: Stop Words in SpellCheckComponent
Add a stopwordfilter to your spellcheck field. -Original message- From:Matthias Müller mm4...@googlemail.com Sent: Thu 31-May-2012 18:39 To: solr-user@lucene.apache.org Subject: Stop Words in SpellCheckComponent Hi, is it possible to configure a stopword list to the SpellCheckComponent? For example: When searching for the indexs the is filtered, because it is a stopword. The SpellCheckComponent gives me a false suggestion for the. But the SpellCheckComponent should only give a suggestion for index because the is a stopword. Kind Regards Matthias
Re: Stop Words in SpellCheckComponent
is it possible to configure a stopword list to the SpellCheckComponent? Add a stopwordfilter to your spellcheck field. Hmm, I did. Could it be another mistake? This is the schema definition: fieldType name=spellcheck_de class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent-nouml.txt / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(.*)[\.\-\']$ replacement=$1 / filter class=solr.StopFilterFactory ignoreCase=true words=german_stop_long.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This is the solrconfig: requestHandler name=search_de class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str int name=rows10/int str name=qftext_de title_de^5/str str name=pftext_de title_de^5/str str name=spellchecktrue/str str name=mm0/str /lst arr name=last-components strspellcheck_de/str /arr /requestHandler searchComponent name=spellcheck_de class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellcheck_de/str str name=spellcheckIndexDirspellchecker_de/str str name=spellcheck.onlyMorePopulartrue/str str name=buildOnOptimizetrue/str /lst /searchComponent
Re: Stop Words in SpellCheckComponent
Spellcheck wants a field, not a field type. You have a spellcheck_de field type, but you need a field as well. str name=fieldspellcheck_de/str That should reference a field, not a field type. -- Jack Krupansky -Original Message- From: Matthias Müller Sent: Thursday, May 31, 2012 3:23 PM To: solr-user@lucene.apache.org Subject: Re: Stop Words in SpellCheckComponent is it possible to configure a stopword list to the SpellCheckComponent? Add a stopwordfilter to your spellcheck field. Hmm, I did. Could it be another mistake? This is the schema definition: fieldType name=spellcheck_de class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent-nouml.txt / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(.*)[\.\-\']$ replacement=$1 / filter class=solr.StopFilterFactory ignoreCase=true words=german_stop_long.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This is the solrconfig: requestHandler name=search_de class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str int name=rows10/int str name=qftext_de title_de^5/str str name=pftext_de title_de^5/str str name=spellchecktrue/str str name=mm0/str /lst arr name=last-components strspellcheck_de/str /arr /requestHandler searchComponent name=spellcheck_de class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspellcheck_de/str str name=spellcheckIndexDirspellchecker_de/str str name=spellcheck.onlyMorePopulartrue/str str name=buildOnOptimizetrue/str /lst /searchComponent
Re: Stop Words in SpellCheckComponent
str name=fieldspellcheck_de/str That should reference a field, not a field type. Thanks for your help. But I did that, too. Here I'll show that even the solr example webapp makes suggestions for stopwords: I've ... 1. added the to the stopwords.txt 2. added thex to an example document (field name) 3. startet solr 4. indexed the example files (sh post.sh *.xml) 5. searched for the solr http://myhost:8983/solr/select?q=the+solrspellcheck=truewt=json 6. got the desired result, but also the wrong suggestion thex { response : { docs : [ {... name : Solr, thex Enterprise Search Server, .. } ], numFound : 1, ... }, ... spellcheck : { suggestions : [ the, {...suggestion : [ thex ] } ] } } Here's the complete diff between the original download and my 3 modifications: diff -r apache-solr-3.6.0/example/exampledocs/solr.xml apache-solr-3.6.0x/example/exampledocs/solr.xml 21c21 field name=nameSolr, the Enterprise Search Server/field --- field name=nameSolr, thex Enterprise Search Server/field diff -r apache-solr-3.6.0/example/solr/conf/solrconfig.xml apache-solr-3.6.0x/example/solr/conf/solrconfig.xml 781a782,785 arr name=last-components strspellcheck/str /arr 1122a1127 str name=buildOnCommittrue/str diff -r apache-solr-3.6.0/example/solr/conf/stopwords.txt apache-solr-3.6.0x/example/solr/conf/stopwords.txt 14a15,16 the