Wildcard malfunctioning

2014-05-05 Thread Román González
Hi all!

 

Sorry in advance if this question was posted but I were unable to find it
with search engines.

 

Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field

 

   field name=cultivo_es type=text_es indexed=true stored=true /

 

With this type:

 

fieldType name=text_es class=solr.TextField
positionIncrementGap=100

  analyzer 

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType

 

But I’m getting these results:

 

q = cultivo_es:uva

Getting 50 correct results

 

q = cultivo_es:uva*

Getting the same 50 correct results

 

q = cultivo_es:naranja

Getting the 50 correct results of “naranja”

 

q = cultivo_es:naranja*

Getting the 0 results !

 

It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.

 

Thank you!!

 



Re: Wildcard malfunctioning

2014-05-05 Thread Ahmet Arslan


Hi Roman,

What you are experiencing is a OK and known. Stemming and wildcard searches 
could be counter intuitive sometimes. But luckily remedy is available. Use the 
following filters, and your wildcard searches will be happy. Please not that 
this change will require solr-restart and re-index.

 filter class=solr.KeywordRepeatFilterFactory/
 filter class=solr.SpanishLightStemFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/

Regarding diacritics, please see 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
 
and http://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet


On Monday, May 5, 2014 2:01 PM, Román González rgonza...@normagricola.com 
wrote:
Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



    fieldType name=text_es class=solr.TextField
positionIncrementGap=100

      analyzer 

        tokenizer class=solr.StandardTokenizerFactory/

        filter class=solr.LowerCaseFilterFactory/

        filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

        filter class=solr.SpanishLightStemFilterFactory/

        !-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

      /analyzer

    /fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!


Re: Wildcard malfunctioning

2014-05-05 Thread Jack Krupansky
Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case 
conversion filters.


But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.


-- Jack Krupansky

-Original Message- 
From: Román González

Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



  field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



   fieldType name=text_es class=solr.TextField
positionIncrementGap=100

 analyzer

   tokenizer class=solr.StandardTokenizerFactory/

   filter class=solr.LowerCaseFilterFactory/

   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

   filter class=solr.SpanishLightStemFilterFactory/

   !-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

 /analyzer

   /fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!





RE: Wildcard malfunctioning

2014-05-05 Thread Román González
SOLVED!

First solution I tried (the Ahmet's one) worked fine!

Thank you!

-Mensaje original-
De: Jack Krupansky [mailto:j...@basetechnology.com] 
Enviado el: lunes, 05 de mayo de 2014 13:19
Para: solr-user@lucene.apache.org; rgonza...@normagricola.com
Asunto: Re: Wildcard malfunctioning

Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case conversion 
filters.

But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

-- Jack Krupansky

-Original Message-
From: Román González
Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it with 
search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards or 
I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



fieldType name=text_es class=solr.TextField
positionIncrementGap=100

  analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need it 
in order to filter diacritics according to Spanish rules.



Thank you!!





Re: Wildcard malfunctioning

2014-05-05 Thread Shawn Heisey
On 5/5/2014 5:19 AM, Jack Krupansky wrote:
 But, you stay that you are using the stemmer to remove diacritical
 marks... you can/should use ASCIIFoldingFilterFactory or
 MappingCharFilterFactory.

I like ICUFoldingFilterFactory for this, but it does require additional
contrib jars (included in the Solr download).  It lowercases too.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn



Re: Wildcard malfunctioning

2014-05-05 Thread Alexandre Rafalovitch
I mark all the filters that support wildcards with (multi) on my list:
http://www.solr-start.com/info/analyzers/ . I uses actual interface
markers to derive that list, so it should be most up to date.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky j...@basetechnology.com wrote:
 Generally, stemming filters are not supported when wildcards are present.
 Only a small subset of filters work with wildcards, such as the case
 conversion filters.

 But, you stay that you are using the stemmer to remove diacritical marks...
 you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

 -- Jack Krupansky

 -Original Message- From: Román González
 Sent: Monday, May 5, 2014 7:00 AM
 To: solr-user@lucene.apache.org
 Subject: Wildcard malfunctioning


 Hi all!



 Sorry in advance if this question was posted but I were unable to find it
 with search engines.



 Filter SpanishLightStemFilterFactory is not working properly with wildcards
 or I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



 With this type:



fieldType name=text_es class=solr.TextField
 positionIncrementGap=100

  analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
 class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType



 But I’m getting these results:



 q = cultivo_es:uva

 Getting 50 correct results



 q = cultivo_es:uva*

 Getting the same 50 correct results



 q = cultivo_es:naranja

 Getting the 50 correct results of “naranja”



 q = cultivo_es:naranja*

 Getting the 0 results !



 It works fine if I remove SpanishLightStemFilterFactory filter, but I need
 it in order to filter diacritics according to Spanish rules.



 Thank you!!