Re: Solr 3.4 problem with words separated by coma without space
Thanks for the answer. yes in fact when I look at debugQuery output, I notice that name and number are never treated as single entries. I have (((text:name text:number)) (text:ru) (text:tain) (text:paris))) so name and number are in same parenthesis, but not exactlly treated as a phrase, as far as I know, since a phrase would be more like text:name number. could you tell me what is the difference between (text:name text:number) and (text:name number)? I'll check autoGeneratePhraseQueries. Best regards, Elisabeth 2011/12/8 Chris Hostetter hossman_luc...@fucit.org : If I check in the solr.admin.analyzer, I get the same analysis for the two : different requests. But it seems, if fact, that the lacking space after : coma prevents name and number from matching. query analysis is only part of hte picture ... Did you look at the debuqQuery output? ... i believe you are seeing the effects of the QueryParser analyzing name, distinctly from number in one case, vs analyzing the entire string name,number in the second case, an treating the later as a phrase query (because one input clause produces multiple tokens) there is a recently added autoGeneratePhraseQueries option that affects this. -Hoss
Solr 3.4 problem with words separated by coma without space
Hello, I'm using Solr 3.4, and I'm having a problem with a request returning different results if I have or not a space after a coma. The request name, number rue taine paris returns results with 4 words out of 5 matching (name, number, rue, paris) The request name,number rue taine paris (no space between coma and number) returns no results, unless I set mm=3, and then matching words are rue, taine, paris. If I check in the solr.admin.analyzer, I get the same analysis for the two different requests. But it seems, if fact, that the lacking space after coma prevents name and number from matching. My field type is analyzer type=query !-- découpage standard -- tokenizer class=solr.StandardTokenizerFactory/ !-- normalisation des accents, cédilles, e dans l'o,... -- charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ filter class=solr.ASCIIFoldingFilterFactory/ !-- suppression des . (I.B.M. = IBM) -- filter class=solr.StandardFilterFactory/ !-- passage en minuscules -- filter class=solr.LowerCaseFilterFactory/ !-- suppression de la ponctuation -- filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ !-- suppression des tokens vides et des mots démesurés -- filter class=solr.LengthFilterFactory min=1 max=100 / !-- découpage des mots composés -- filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 preserveOriginal=1/ !-- suppression des élisions (l', qu',...) -- filter class=solr.ElisionFilterFactory articles=elisionwords.txt/ !-- suppression des mots insignifiants -- filter class=solr.StopFilterFactory ignoreCase=1 words=stopwords.txt enablePositionIncrements=true/ !-- lemmatisation (pluriels,...) -- filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ !-- suppression des doublons éventuels -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Anyone has a clue? Thanks, Elisabeth
Re: Solr 3.4 problem with words separated by coma without space
same problem with Solr 4.0 2011/12/8 elisabeth benoit elisaelisael...@gmail.com Hello, I'm using Solr 3.4, and I'm having a problem with a request returning different results if I have or not a space after a coma. The request name, number rue taine paris returns results with 4 words out of 5 matching (name, number, rue, paris) The request name,number rue taine paris (no space between coma and number) returns no results, unless I set mm=3, and then matching words are rue, taine, paris. If I check in the solr.admin.analyzer, I get the same analysis for the two different requests. But it seems, if fact, that the lacking space after coma prevents name and number from matching. My field type is analyzer type=query !-- découpage standard -- tokenizer class=solr.StandardTokenizerFactory/ !-- normalisation des accents, cédilles, e dans l'o,... -- charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ filter class=solr.ASCIIFoldingFilterFactory/ !-- suppression des . (I.B.M. = IBM) -- filter class=solr.StandardFilterFactory/ !-- passage en minuscules -- filter class=solr.LowerCaseFilterFactory/ !-- suppression de la ponctuation -- filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ !-- suppression des tokens vides et des mots démesurés -- filter class=solr.LengthFilterFactory min=1 max=100 / !-- découpage des mots composés -- filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 preserveOriginal=1/ !-- suppression des élisions (l', qu',...) -- filter class=solr.ElisionFilterFactory articles=elisionwords.txt/ !-- suppression des mots insignifiants -- filter class=solr.StopFilterFactory ignoreCase=1 words=stopwords.txt enablePositionIncrements=true/ !-- lemmatisation (pluriels,...) -- filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ !-- suppression des doublons éventuels -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Anyone has a clue? Thanks, Elisabeth
Re: Solr 3.4 problem with words separated by coma without space
This would seem to indicate that you are using a whitespace analyzer on the default search field. I believe other analyzers will properly tokenize around the comma. same problem with Solr 4.0 2011/12/8 elisabeth benoit elisaelisael...@gmail.com Hello, I'm using Solr 3.4, and I'm having a problem with a request returning different results if I have or not a space after a coma. The request name, number rue taine paris returns results with 4 words out of 5 matching (name, number, rue, paris) The request name,number rue taine paris (no space between coma and number) returns no results, unless I set mm=3, and then matching words are rue, taine, paris. If I check in the solr.admin.analyzer, I get the same analysis for the two different requests. But it seems, if fact, that the lacking space after coma prevents name and number from matching. My field type is analyzer type=query !-- découpage standard -- tokenizer class=solr.StandardTokenizerFactory/ !-- normalisation des accents, cédilles, e dans l'o,... -- charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ filter class=solr.ASCIIFoldingFilterFactory/ !-- suppression des . (I.B.M. = IBM) -- filter class=solr.StandardFilterFactory/ !-- passage en minuscules -- filter class=solr.LowerCaseFilterFactory/ !-- suppression de la ponctuation -- filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/ !-- suppression des tokens vides et des mots démesurés -- filter class=solr.LengthFilterFactory min=1 max=100 / !-- découpage des mots composés -- filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 preserveOriginal=1/ !-- suppression des élisions (l', qu',...) -- filter class=solr.ElisionFilterFactory articles=elisionwords.txt/ !-- suppression des mots insignifiants -- filter class=solr.StopFilterFactory ignoreCase=1 words=stopwords.txt enablePositionIncrements=true/ !-- lemmatisation (pluriels,...) -- filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ !-- suppression des doublons éventuels -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Anyone has a clue? Thanks, Elisabeth
Re: Solr 3.4 problem with words separated by coma without space
: If I check in the solr.admin.analyzer, I get the same analysis for the two : different requests. But it seems, if fact, that the lacking space after : coma prevents name and number from matching. query analysis is only part of hte picture ... Did you look at the debuqQuery output? ... i believe you are seeing the effects of the QueryParser analyzing name, distinctly from number in one case, vs analyzing the entire string name,number in the second case, an treating the later as a phrase query (because one input clause produces multiple tokens) there is a recently added autoGeneratePhraseQueries option that affects this. -Hoss