Re: Solr 3.4 problem with words separated by coma without space

2011-12-12 Thread elisabeth benoit
Thanks for the answer.

yes in fact when I look at debugQuery output, I notice that name and number
are never treated as single entries.

I have

(((text:name text:number)) (text:ru) (text:tain) (text:paris)))

so name and number are in same parenthesis, but not exactlly treated as a
phrase, as far as I know, since a phrase would be more like text:name
number.

could you tell me what is the difference between (text:name text:number)
and (text:name number)?

I'll check autoGeneratePhraseQueries.

Best regards,
Elisabeth




2011/12/8 Chris Hostetter hossman_luc...@fucit.org


 : If I check in the solr.admin.analyzer, I get the same analysis for the
 two
 : different requests. But it seems, if fact, that the lacking space after
 : coma prevents name and number from matching.

 query analysis is only part of hte picture ... Did you look at the
 debuqQuery output? ...  i believe you are seeing the effects of the
 QueryParser analyzing name, distinctly from number in one case, vs
 analyzing the entire string name,number in the second case, an treating
 the later as a phrase query (because one input clause produces multiple
 tokens)

 there is a recently added autoGeneratePhraseQueries option that affects
 this.


 -Hoss



Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit
Hello,

I'm using Solr 3.4, and I'm having a problem with a request returning
different results if I have or not a space after a coma.

The request name, number rue taine paris returns results with 4 words out
of 5 matching (name, number, rue, paris)

The request name,number rue taine paris (no space between coma and
number) returns no results, unless I set mm=3, and then matching words
are rue, taine, paris.

If I check in the solr.admin.analyzer, I get the same analysis for the two
different requests. But it seems, if fact, that the lacking space after
coma prevents name and number from matching.


My field type is


  analyzer type=query
!-- découpage standard --
tokenizer class=solr.StandardTokenizerFactory/
!-- normalisation des accents, cédilles, e dans l'o,... --
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
filter class=solr.ASCIIFoldingFilterFactory/
!-- suppression des . (I.B.M. = IBM) --
filter class=solr.StandardFilterFactory/
!-- passage en minuscules --
filter class=solr.LowerCaseFilterFactory/
!-- suppression de la ponctuation --
filter class=solr.PatternReplaceFilterFactory
pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
!-- suppression des tokens vides et des mots démesurés --
filter class=solr.LengthFilterFactory min=1 max=100 /
!-- découpage des mots composés --
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1
generateWordParts=1

generateNumberParts=1 catenateWords=0 catenateNumbers=1
catenateAll=0 preserveOriginal=1/
!-- suppression des élisions (l', qu',...) --
filter class=solr.ElisionFilterFactory
articles=elisionwords.txt/
!-- suppression des mots insignifiants --
filter class=solr.StopFilterFactory ignoreCase=1
words=stopwords.txt enablePositionIncrements=true/
!-- lemmatisation (pluriels,...) --
filter class=solr.SnowballPorterFilterFactory language=French
protected=protwords.txt/
!-- suppression des doublons éventuels --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

Anyone has a clue?

Thanks,
Elisabeth


Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread elisabeth benoit
same problem with Solr 4.0

2011/12/8 elisabeth benoit elisaelisael...@gmail.com



 Hello,

 I'm using Solr 3.4, and I'm having a problem with a request returning
 different results if I have or not a space after a coma.

 The request name, number rue taine paris returns results with 4 words
 out of 5 matching (name, number, rue, paris)

 The request name,number rue taine paris (no space between coma and
 number) returns no results, unless I set mm=3, and then matching words
 are rue, taine, paris.

 If I check in the solr.admin.analyzer, I get the same analysis for the two
 different requests. But it seems, if fact, that the lacking space after
 coma prevents name and number from matching.


 My field type is


   analyzer type=query
 !-- découpage standard --
 tokenizer class=solr.StandardTokenizerFactory/
 !-- normalisation des accents, cédilles, e dans l'o,... --
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
 filter class=solr.ASCIIFoldingFilterFactory/
 !-- suppression des . (I.B.M. = IBM) --
 filter class=solr.StandardFilterFactory/
 !-- passage en minuscules --
 filter class=solr.LowerCaseFilterFactory/
 !-- suppression de la ponctuation --
 filter class=solr.PatternReplaceFilterFactory
 pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 !-- suppression des tokens vides et des mots démesurés --
 filter class=solr.LengthFilterFactory min=1 max=100 /
 !-- découpage des mots composés --
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1
 generateWordParts=1

 generateNumberParts=1 catenateWords=0 catenateNumbers=1
 catenateAll=0 preserveOriginal=1/
 !-- suppression des élisions (l', qu',...) --
 filter class=solr.ElisionFilterFactory
 articles=elisionwords.txt/
 !-- suppression des mots insignifiants --
 filter class=solr.StopFilterFactory ignoreCase=1
 words=stopwords.txt enablePositionIncrements=true/
 !-- lemmatisation (pluriels,...) --
 filter class=solr.SnowballPorterFilterFactory language=French
 protected=protwords.txt/
 !-- suppression des doublons éventuels --
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer

 Anyone has a clue?

 Thanks,
 Elisabeth



Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread darren
This would seem to indicate that you are using a whitespace analyzer on
the default search field. I believe other analyzers will properly tokenize
around the comma.

 same problem with Solr 4.0

 2011/12/8 elisabeth benoit elisaelisael...@gmail.com



 Hello,

 I'm using Solr 3.4, and I'm having a problem with a request returning
 different results if I have or not a space after a coma.

 The request name, number rue taine paris returns results with 4 words
 out of 5 matching (name, number, rue, paris)

 The request name,number rue taine paris (no space between coma and
 number) returns no results, unless I set mm=3, and then matching words
 are rue, taine, paris.

 If I check in the solr.admin.analyzer, I get the same analysis for the
 two
 different requests. But it seems, if fact, that the lacking space after
 coma prevents name and number from matching.


 My field type is


   analyzer type=query
 !-- découpage standard --
 tokenizer class=solr.StandardTokenizerFactory/
 !-- normalisation des accents, cédilles, e dans l'o,... --
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
 filter class=solr.ASCIIFoldingFilterFactory/
 !-- suppression des . (I.B.M. = IBM) --
 filter class=solr.StandardFilterFactory/
 !-- passage en minuscules --
 filter class=solr.LowerCaseFilterFactory/
 !-- suppression de la ponctuation --
 filter class=solr.PatternReplaceFilterFactory
 pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2/
 !-- suppression des tokens vides et des mots démesurés --
 filter class=solr.LengthFilterFactory min=1 max=100 /
 !-- découpage des mots composés --
 filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1
 generateWordParts=1

 generateNumberParts=1 catenateWords=0 catenateNumbers=1
 catenateAll=0 preserveOriginal=1/
 !-- suppression des élisions (l', qu',...) --
 filter class=solr.ElisionFilterFactory
 articles=elisionwords.txt/
 !-- suppression des mots insignifiants --
 filter class=solr.StopFilterFactory ignoreCase=1
 words=stopwords.txt enablePositionIncrements=true/
 !-- lemmatisation (pluriels,...) --
 filter class=solr.SnowballPorterFilterFactory
 language=French
 protected=protwords.txt/
 !-- suppression des doublons éventuels --
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer

 Anyone has a clue?

 Thanks,
 Elisabeth





Re: Solr 3.4 problem with words separated by coma without space

2011-12-08 Thread Chris Hostetter

: If I check in the solr.admin.analyzer, I get the same analysis for the two
: different requests. But it seems, if fact, that the lacking space after
: coma prevents name and number from matching.

query analysis is only part of hte picture ... Did you look at the 
debuqQuery output? ...  i believe you are seeing the effects of the 
QueryParser analyzing name, distinctly from number in one case, vs 
analyzing the entire string name,number in the second case, an treating 
the later as a phrase query (because one input clause produces multiple 
tokens)

there is a recently added autoGeneratePhraseQueries option that affects 
this.


-Hoss