Re: Solr 3.4 problem with words separated by coma without space

2011-12-12 Thread elisabeth benoit
Thanks for the answer.

yes in fact when I look at debugQuery output, I notice that name and number
are never treated as single entries.

I have

(((text:name text:number)) (text:ru) (text:tain) (text:paris)))

so name and number are in same parenthesis, but not exactlly treated as a
phrase, as far as I know, since a phrase would be more like text:name

could you tell me what is the difference between (text:name text:number)
and (text:name number)?

I'll check autoGeneratePhraseQueries.

Best regards,

2011/12/8 Chris Hostetter

 : If I check in the solr.admin.analyzer, I get the same analysis for the
 : different requests. But it seems, if fact, that the lacking space after
 : coma prevents name and number from matching.

 query analysis is only part of hte picture ... Did you look at the
 debuqQuery output? ...  i believe you are seeing the effects of the
 QueryParser analyzing name, distinctly from number in one case, vs
 analyzing the entire string name,number in the second case, an treating
 the later as a phrase query (because one input clause produces multiple

 there is a recently added autoGeneratePhraseQueries option that affects


Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit

I'm using Solr 3.4, and I'm having a problem with a request returning
different results if I have or not a space after a coma.

The request name, number rue taine paris returns results with 4 words out
of 5 matching (name, number, rue, paris)

The request name,number rue taine paris (no space between coma and
number) returns no results, unless I set mm=3, and then matching words
are rue, taine, paris.

If I check in the solr.admin.analyzer, I get the same analysis for the two
different requests. But it seems, if fact, that the lacking space after
coma prevents name and number from matching.

My field type is

  analyzer type=query
!-- découpage standard --
tokenizer class=solr.StandardTokenizerFactory/
!-- normalisation des accents, cédilles, e dans l'o,... --
charFilter class=solr.MappingCharFilterFactory
filter class=solr.ASCIIFoldingFilterFactory/
!-- suppression des . (I.B.M. = IBM) --
filter class=solr.StandardFilterFactory/
!-- passage en minuscules --
filter class=solr.LowerCaseFilterFactory/
!-- suppression de la ponctuation --
filter class=solr.PatternReplaceFilterFactory
pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
!-- suppression des tokens vides et des mots démesurés --
filter class=solr.LengthFilterFactory min=1 max=100 /
!-- découpage des mots composés --
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1

generateNumberParts=1 catenateWords=0 catenateNumbers=1
catenateAll=0 preserveOriginal=1/
!-- suppression des élisions (l', qu',...) --
filter class=solr.ElisionFilterFactory
!-- suppression des mots insignifiants --
filter class=solr.StopFilterFactory ignoreCase=1
words=stopwords.txt enablePositionIncrements=true/
!-- lemmatisation (pluriels,...) --
filter class=solr.SnowballPorterFilterFactory language=French
!-- suppression des doublons éventuels --
filter class=solr.RemoveDuplicatesTokenFilterFactory/

Anyone has a clue?


Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit
same problem with Solr 4.0

2011/12/8 elisabeth benoit


 I'm using Solr 3.4, and I'm having a problem with a request returning
 different results if I have or not a space after a coma.

 The request name, number rue taine paris returns results with 4 words
 out of 5 matching (name, number, rue, paris)

 The request name,number rue taine paris (no space between coma and
 number) returns no results, unless I set mm=3, and then matching words
 are rue, taine, paris.

 If I check in the solr.admin.analyzer, I get the same analysis for the two
 different requests. But it seems, if fact, that the lacking space after
 coma prevents name and number from matching.

 My field type is

   analyzer type=query
 !-- découpage standard --
 tokenizer class=solr.StandardTokenizerFactory/
 !-- normalisation des accents, cédilles, e dans l'o,... --
 charFilter class=solr.MappingCharFilterFactory
 filter class=solr.ASCIIFoldingFilterFactory/
 !-- suppression des . (I.B.M. = IBM) --
 filter class=solr.StandardFilterFactory/
 !-- passage en minuscules --
 filter class=solr.LowerCaseFilterFactory/
 !-- suppression de la ponctuation --
 filter class=solr.PatternReplaceFilterFactory
 pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 !-- suppression des tokens vides et des mots démesurés --
 filter class=solr.LengthFilterFactory min=1 max=100 /
 !-- découpage des mots composés --
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1

 generateNumberParts=1 catenateWords=0 catenateNumbers=1
 catenateAll=0 preserveOriginal=1/
 !-- suppression des élisions (l', qu',...) --
 filter class=solr.ElisionFilterFactory
 !-- suppression des mots insignifiants --
 filter class=solr.StopFilterFactory ignoreCase=1
 words=stopwords.txt enablePositionIncrements=true/
 !-- lemmatisation (pluriels,...) --
 filter class=solr.SnowballPorterFilterFactory language=French
 !-- suppression des doublons éventuels --
 filter class=solr.RemoveDuplicatesTokenFilterFactory/

 Anyone has a clue?


Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread darren
This would seem to indicate that you are using a whitespace analyzer on
the default search field. I believe other analyzers will properly tokenize
around the comma.

 same problem with Solr 4.0

 2011/12/8 elisabeth benoit


 I'm using Solr 3.4, and I'm having a problem with a request returning
 different results if I have or not a space after a coma.

 The request name, number rue taine paris returns results with 4 words
 out of 5 matching (name, number, rue, paris)

 The request name,number rue taine paris (no space between coma and
 number) returns no results, unless I set mm=3, and then matching words
 are rue, taine, paris.

 If I check in the solr.admin.analyzer, I get the same analysis for the
 different requests. But it seems, if fact, that the lacking space after
 coma prevents name and number from matching.

 My field type is

   analyzer type=query
 !-- découpage standard --
 tokenizer class=solr.StandardTokenizerFactory/
 !-- normalisation des accents, cédilles, e dans l'o,... --
 charFilter class=solr.MappingCharFilterFactory
 filter class=solr.ASCIIFoldingFilterFactory/
 !-- suppression des . (I.B.M. = IBM) --
 filter class=solr.StandardFilterFactory/
 !-- passage en minuscules --
 filter class=solr.LowerCaseFilterFactory/
 !-- suppression de la ponctuation --
 filter class=solr.PatternReplaceFilterFactory
 pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 !-- suppression des tokens vides et des mots démesurés --
 filter class=solr.LengthFilterFactory min=1 max=100 /
 !-- découpage des mots composés --
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1

 generateNumberParts=1 catenateWords=0 catenateNumbers=1
 catenateAll=0 preserveOriginal=1/
 !-- suppression des élisions (l', qu',...) --
 filter class=solr.ElisionFilterFactory
 !-- suppression des mots insignifiants --
 filter class=solr.StopFilterFactory ignoreCase=1
 words=stopwords.txt enablePositionIncrements=true/
 !-- lemmatisation (pluriels,...) --
 filter class=solr.SnowballPorterFilterFactory
 !-- suppression des doublons éventuels --
 filter class=solr.RemoveDuplicatesTokenFilterFactory/

 Anyone has a clue?


Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread Chris Hostetter

: If I check in the solr.admin.analyzer, I get the same analysis for the two
: different requests. But it seems, if fact, that the lacking space after
: coma prevents name and number from matching.

query analysis is only part of hte picture ... Did you look at the 
debuqQuery output? ...  i believe you are seeing the effects of the 
QueryParser analyzing name, distinctly from number in one case, vs 
analyzing the entire string name,number in the second case, an treating 
the later as a phrase query (because one input clause produces multiple 

there is a recently added autoGeneratePhraseQueries option that affects 
