Wildcard malfunctioning
Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or Im misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But Im getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of naranja q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Re: Wildcard malfunctioning
Hi Roman, What you are experiencing is a OK and known. Stemming and wildcard searches could be counter intuitive sometimes. But luckily remedy is available. Use the following filters, and your wildcard searches will be happy. Please not that this change will require solr-restart and re-index. filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.SpanishLightStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ Regarding diacritics, please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory and http://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Monday, May 5, 2014 2:01 PM, Román González rgonza...@normagricola.com wrote: Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Re: Wildcard malfunctioning
Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
RE: Wildcard malfunctioning
SOLVED! First solution I tried (the Ahmet's one) worked fine! Thank you! -Mensaje original- De: Jack Krupansky [mailto:j...@basetechnology.com] Enviado el: lunes, 05 de mayo de 2014 13:19 Para: solr-user@lucene.apache.org; rgonza...@normagricola.com Asunto: Re: Wildcard malfunctioning Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Re: Wildcard malfunctioning
On 5/5/2014 5:19 AM, Jack Krupansky wrote: But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. I like ICUFoldingFilterFactory for this, but it does require additional contrib jars (included in the Solr download). It lowercases too. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory Thanks, Shawn
Re: Wildcard malfunctioning
I mark all the filters that support wildcards with (multi) on my list: http://www.solr-start.com/info/analyzers/ . I uses actual interface markers to derive that list, so it should be most up to date. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky j...@basetechnology.com wrote: Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!