Re: Multi-words synonyms matching

2012-06-05 Thread O. Klein
The reason multi word synonyms work better if you use LUCENE_33 is because
then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory
(FSTSynonymFilterFactory).

But I don't know if the difference between them is a bug or not. Maybe
someone has more insight?




Bernd Fehling-2 wrote
 
 Are you sure with LUCENE_33 (Use of BitVector)?
 
 
 Am 31.05.2012 17:20, schrieb O. Klein:
 I have been struggling with this as well and found that using LUCENE_33
 gives
 the best results.
 
 But as it will be deprecated this is no everlasting solution. May
 somebody
 knows one?

 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-06-05 Thread Bernd Fehling
Do you have test cases?

What are you sending to your SynonymFilterFactory?

What are you expecting it should return?

What is it returning when setting to Version.LUCENE_33?

What is it returning when setting to Version.LUCENE_36?



Am 05.06.2012 10:56, schrieb O. Klein:
 The reason multi word synonyms work better if you use LUCENE_33 is because
 then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory
 (FSTSynonymFilterFactory).
 
 But I don't know if the difference between them is a bug or not. Maybe
 someone has more insight?
 
 
 
 
 Bernd Fehling-2 wrote

 Are you sure with LUCENE_33 (Use of BitVector)?


 Am 31.05.2012 17:20, schrieb O. Klein:
 I have been struggling with this as well and found that using LUCENE_33
 gives
 the best results.

 But as it will be deprecated this is no everlasting solution. May
 somebody
 knows one?


 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-06-01 Thread Bernd Fehling

Are you sure with LUCENE_33 (Use of BitVector)?


Am 31.05.2012 17:20, schrieb O. Klein:
 I have been struggling with this as well and found that using LUCENE_33 gives
 the best results.
 
 But as it will be deprecated this is no everlasting solution. May somebody
 knows one?
 


Re: Multi-words synonyms matching

2012-06-01 Thread O. Klein
Looking for some more background information I stumbled upon
https://issues.apache.org/jira/browse/LUCENE-3668. If you read the last post
it confirms my issue. So maybe this is a bug?



Bernd Fehling-2 wrote
 
 Are you sure with LUCENE_33 (Use of BitVector)?
 
 
 Am 31.05.2012 17:20, schrieb O. Klein:
 I have been struggling with this as well and found that using LUCENE_33
 gives
 the best results.
 
 But as it will be deprecated this is no everlasting solution. May
 somebody
 knows one?

 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-05-31 Thread O. Klein
I have been struggling with this as well and found that using LUCENE_33 gives
the best results.

But as it will be deprecated this is no everlasting solution. May somebody
knows one?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987048.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-words synonyms matching

2012-05-29 Thread Bernd Fehling
 will
 ask
 hôtel de ville, it won't match.

 In fact, at index time I have mairie in my data, but I want user
 to be
 able
 to request mairie or hôtel de ville and have mairie as
 answer,
 and
 not
 have mairie as an answer when requesting hôtel.


 To map `mairie` to `hotel de ville` as single token you must
 escape
 your
 white
 space.

 mairie, hotel\ de\ ville

 This results in  a problem if your tokenizer splits on white
 space
 at
 query
 time.

 Ok, I guess this means I have a problem. No simple solution
 since
 at
 query
 time my tokenizer do split on white spaces.

 I guess my problem is more or less one of the problems
 discussed in



 http://lucene.472066.n3.**nabble.com/Multi-word-**
 synonyms-td3716292.html#**a3717215


 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215



 Thanks a lot for your answers,
 Elisabeth





 2012/4/10 Erick Erickson erickerick...@gmail.com

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Best
 Erick

 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
 Hello,

 I've read several post on this issue, but can't find a real
 solution
 to
 my
 multi-words synonyms matching problem.

 I have in my synonyms.txt an entry like

 mairie, hotel de ville

 and my index time analyzer is configured as followed for
 synonyms.

 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/

 The problem I have is that now mairie matches with hotel
 and
 I
 would
 only want mairie to match with hotel de ville and
 mairie.

 When I look into the analyzer, I see that mairie is mapped
 into
 hotel,
 and words de ville are added in second and third position.
 To
 change
 that, I tried to do

 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I
 read in
 one
 post)

 and I can see now in the analyzer that mairie is mapped to
 hotel
 de
 ville, but now when I have query hotel de ville, it doesn't
 match
 at
 all
 with mairie.

 Anyone has a clue of what I'm doing wrong?

 I'm using Solr 3.4.

 Thanks,
 Elisabeth










 --
 *
 Bernd FehlingUniversitätsbibliothek Bielefeld
 Dipl.-Inform. (FH)Universitätsstr. 25
 Tel. +49 521 106-4060   Fax. +49 521 106-4052
 bernd.fehl...@uni-bielefeld.de33615 Bielefeld

 BASE - Bielefeld Academic Search Engine - www.base-search.net
 *

 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Multi-words synonyms matching

2012-05-29 Thread elisabeth benoit
 tokenizer splits at query time will
 be
  a
  problem as described by Markus.
 
  --Jeevanandam
 
  On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it
 doesn't
  solve
  my
  problem, since this means hotel de ville will be replace by
  mairie
  at
  index time (I use synonyms only at index time). So when user
  will
  ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want
 user
  to be
  able
  to request mairie or hôtel de ville and have mairie as
  answer,
  and
  not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must
  escape
  your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white
  space
  at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution
  since
  at
  query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems
  discussed in
 
 
 
  http://lucene.472066.n3.**nabble.com/Multi-word-**
  synonyms-td3716292.html#**a3717215
 
 
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real
  solution
  to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for
  synonyms.
 
  filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel
  and
  I
  would
  only want mairie to match with hotel de ville and
  mairie.
 
  When I look into the analyzer, I see that mairie is mapped
  into
  hotel,
  and words de ville are added in second and third position.
  To
  change
  that, I tried to do
 
  filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I
  read in
  one
  post)
 
  and I can see now in the analyzer that mairie is mapped to
  hotel
  de
  ville, but now when I have query hotel de ville, it
 doesn't
  match
  at
  all
  with mairie.
 
  Anyone has a clue of what I'm doing wrong?
 
  I'm using Solr 3.4.
 
  Thanks,
  Elisabeth
 
 
 
 
 
 
 
 
 
 
  --
  *
  Bernd FehlingUniversitätsbibliothek Bielefeld
  Dipl.-Inform. (FH)Universitätsstr. 25
  Tel. +49 521 106-4060   Fax. +49 521 106-4052
  bernd.fehl...@uni-bielefeld.de33615 Bielefeld
 
  BASE - Bielefeld Academic Search Engine - www.base-search.net
  *
 
 

 --
 *
 Bernd FehlingUniversitätsbibliothek Bielefeld
 Dipl.-Inform. (FH)Universitätsstr. 25
 Tel. +49 521 106-4060   Fax. +49 521 106-4052
 bernd.fehl...@uni-bielefeld.de33615 Bielefeld

 BASE - Bielefeld Academic Search Engine - www.base-search.net
 *



Re: Multi-words synonyms matching

2012-05-29 Thread Lance Norskog
 elisaelisael...@gmail.com
 
   oh, that's right.
 
  thanks a lot,
  Elisabeth
 
 
  2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com
 
   Elisabeth -
 
  As you described, below mapping might suit for your need.
  mairie = hotel de ville, mairie
 
  mairie gets expanded to hotel de ville and mairie at index
  time.
   So
  mairie and hotel de ville searchable on document.
 
  However, still white space tokenizer splits at query time will
 be
  a
  problem as described by Markus.
 
  --Jeevanandam
 
  On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it
 doesn't
  solve
  my
  problem, since this means hotel de ville will be replace by
  mairie
  at
  index time (I use synonyms only at index time). So when user
  will
  ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want
 user
  to be
  able
  to request mairie or hôtel de ville and have mairie as
  answer,
  and
  not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must
  escape
  your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white
  space
  at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution
  since
  at
  query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems
  discussed in
 
 
 
  http://lucene.472066.n3.**nabble.com/Multi-word-**
  synonyms-td3716292.html#**a3717215
 
 
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real
  solution
  to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for
  synonyms.
 
  filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel
  and
  I
  would
  only want mairie to match with hotel de ville and
  mairie.
 
  When I look into the analyzer, I see that mairie is mapped
  into
  hotel,
  and words de ville are added in second and third position.
  To
  change
  that, I tried to do
 
  filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I
  read in
  one
  post)
 
  and I can see now in the analyzer that mairie is mapped to
  hotel
  de
  ville, but now when I have query hotel de ville, it
 doesn't
  match
  at
  all
  with mairie.
 
  Anyone has a clue of what I'm doing wrong?
 
  I'm using Solr 3.4.
 
  Thanks,
  Elisabeth
 
 
 
 
 
 
 
 
 
 
  --
  *
  Bernd Fehling                Universitätsbibliothek Bielefeld
  Dipl.-Inform. (FH)                        Universitätsstr. 25
  Tel. +49 521 106-4060                   Fax. +49 521 106-4052
  bernd.fehl...@uni-bielefeld.de                33615 Bielefeld
 
  BASE - Bielefeld Academic Search Engine - www.base-search.net
  *
 
 

 --
 *
 Bernd Fehling                Universitätsbibliothek Bielefeld
 Dipl.-Inform. (FH)                        Universitätsstr. 25
 Tel. +49 521 106-4060                   Fax. +49 521 106-4052
 bernd.fehl...@uni-bielefeld.de                33615 Bielefeld

 BASE - Bielefeld Academic Search Engine - www.base-search.net
 *




-- 
Lance Norskog
goks...@gmail.com


Re: Multi-words synonyms matching

2012-05-23 Thread elisabeth benoit
 question is are you absolutely sure that your
  CATEGORY_ANALYZED field has the correct content?. How does it
  get populated?
 
  Nothing jumps out at me here
 
  Best
  Erick
 
  On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  yes, thanks, but this is NOT my question.
 
  I was wondering why I have multiple matches with q=hotel de ville
  and
  no
  match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case
  I'm
  searching in the same solr fieldType.
 
  Why is q parameter behaving differently in that case? Why do the
  quotes
  work in one case and not in the other?
 
  Does anyone know?
 
  Thanks,
  Elisabeth
 
  2012/4/24 Jeevanandam je...@myjeeva.com
 
 
  usage of q and fq
 
  q = is typically the main query for the search request
 
  fq = is Filter Query; generally used to restrict the super set of
  documents without influencing score (more info.
  http://wiki.apache.org/solr/**CommonQueryParameters#q
  http://wiki.apache.org/solr/CommonQueryParameters#q
  )
 
  For example:
  
  q=hotel de ville === returns 100 documents
 
  q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed
  ===
  returns 40 documents from super set of 100 documents
 
 
  hope this helps!
 
  - Jeevanandam
 
 
 
  On 24-04-2012 3:08 pm, elisabeth benoit wrote:
 
  Hello,
 
  I'd like to resume this post.
 
  The only way I found to do not split synonyms in words in
  synonyms.txt
  it
  to use the line
 
   filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.**KeywordTokenizerFactory/
 
  in schema.xml
 
  where tokenizerFactory=solr.**KeywordTokenizerFactory
 
  instructs SynonymFilterFactory not to break synonyms into words on
  white
  spaces when parsing synonyms file.
 
  So now it works fine, mairie is mapped into hotel de ville and
  when I
  send request q=hotel de ville (quotes are mandatory to prevent
  analyzer
  to split hotel de ville on white spaces), I get answers with word
  mairie.
 
  But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de
  ville), it
  doesn't work!!!
 
  CATEGORY_ANALYZED is same field type as default search field. This
  means
  that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel
  de
  ville, solr uses the same analyzer, the one with the line
 
  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.**KeywordTokenizerFactory/.
 
  Anyone as a clue what is different between q analysis behaviour and
  fq
  analysis behaviour?
 
  Thanks a lot
  Elisabeth
 
  2012/4/12 elisabeth benoit elisaelisael...@gmail.com
 
   oh, that's right.
 
  thanks a lot,
  Elisabeth
 
 
  2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com
 
   Elisabeth -
 
  As you described, below mapping might suit for your need.
  mairie = hotel de ville, mairie
 
  mairie gets expanded to hotel de ville and mairie at index
  time.
   So
  mairie and hotel de ville searchable on document.
 
  However, still white space tokenizer splits at query time will be
  a
  problem as described by Markus.
 
  --Jeevanandam
 
  On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it doesn't
  solve
  my
  problem, since this means hotel de ville will be replace by
  mairie
  at
  index time (I use synonyms only at index time). So when user
  will
  ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want user
  to be
  able
  to request mairie or hôtel de ville and have mairie as
  answer,
  and
  not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must
  escape
  your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white
  space
  at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution
  since
  at
  query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems
  discussed in
 
 
 
  http://lucene.472066.n3.**nabble.com/Multi-word-**
  synonyms-td3716292.html#**a3717215
 
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real
  solution
  to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured

Re: Multi-words synonyms matching

2012-05-15 Thread Bernd Fehling
 request

 fq = is Filter Query; generally used to restrict the super set of
 documents without influencing score (more info.
 http://wiki.apache.org/solr/**CommonQueryParameters#q
 http://wiki.apache.org/solr/CommonQueryParameters#q
 )

 For example:
 
 q=hotel de ville === returns 100 documents

 q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed
 ===
 returns 40 documents from super set of 100 documents


 hope this helps!

 - Jeevanandam



 On 24-04-2012 3:08 pm, elisabeth benoit wrote:

 Hello,

 I'd like to resume this post.

 The only way I found to do not split synonyms in words in
 synonyms.txt
 it
 to use the line

  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/

 in schema.xml

 where tokenizerFactory=solr.**KeywordTokenizerFactory

 instructs SynonymFilterFactory not to break synonyms into words on
 white
 spaces when parsing synonyms file.

 So now it works fine, mairie is mapped into hotel de ville and
 when I
 send request q=hotel de ville (quotes are mandatory to prevent
 analyzer
 to split hotel de ville on white spaces), I get answers with word
 mairie.

 But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de
 ville), it
 doesn't work!!!

 CATEGORY_ANALYZED is same field type as default search field. This
 means
 that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel
 de
 ville, solr uses the same analyzer, the one with the line

 filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/.

 Anyone as a clue what is different between q analysis behaviour and
 fq
 analysis behaviour?

 Thanks a lot
 Elisabeth

 2012/4/12 elisabeth benoit elisaelisael...@gmail.com

  oh, that's right.

 thanks a lot,
 Elisabeth


 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com

  Elisabeth -

 As you described, below mapping might suit for your need.
 mairie = hotel de ville, mairie

 mairie gets expanded to hotel de ville and mairie at index
 time.
  So
 mairie and hotel de ville searchable on document.

 However, still white space tokenizer splits at query time will be
 a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Yes, thanks, I've tried it but from what I undestand it doesn't
 solve
 my
 problem, since this means hotel de ville will be replace by
 mairie
 at
 index time (I use synonyms only at index time). So when user
 will
 ask
 hôtel de ville, it won't match.

 In fact, at index time I have mairie in my data, but I want user
 to be
 able
 to request mairie or hôtel de ville and have mairie as
 answer,
 and
 not
 have mairie as an answer when requesting hôtel.


 To map `mairie` to `hotel de ville` as single token you must
 escape
 your
 white
 space.

 mairie, hotel\ de\ ville

 This results in  a problem if your tokenizer splits on white
 space
 at
 query
 time.

 Ok, I guess this means I have a problem. No simple solution
 since
 at
 query
 time my tokenizer do split on white spaces.

 I guess my problem is more or less one of the problems
 discussed in



 http://lucene.472066.n3.**nabble.com/Multi-word-**
 synonyms-td3716292.html#**a3717215

 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215



 Thanks a lot for your answers,
 Elisabeth





 2012/4/10 Erick Erickson erickerick...@gmail.com

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Best
 Erick

 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
 Hello,

 I've read several post on this issue, but can't find a real
 solution
 to
 my
 multi-words synonyms matching problem.

 I have in my synonyms.txt an entry like

 mairie, hotel de ville

 and my index time analyzer is configured as followed for
 synonyms.

 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/

 The problem I have is that now mairie matches with hotel
 and
 I
 would
 only want mairie to match with hotel de ville and
 mairie.

 When I look into the analyzer, I see that mairie is mapped
 into
 hotel,
 and words de ville are added in second and third position.
 To
 change
 that, I tried to do

 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I
 read in
 one
 post)

 and I can see now in the analyzer that mairie is mapped to
 hotel
 de
 ville, but now when I have query hotel de ville, it doesn't
 match
 at
 all
 with mairie.

 Anyone has a clue of what I'm doing wrong?

 I'm using Solr 3.4.

 Thanks,
 Elisabeth








 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld

Re: Multi-words synonyms matching

2012-05-14 Thread elisabeth benoit
 use fq parameter (fq=CATEGORY_ANALYZED:hotel de
 ville), it
   doesn't work!!!
  
   CATEGORY_ANALYZED is same field type as default search field. This
  means
   that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel
 de
   ville, solr uses the same analyzer, the one with the line
  
   filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true
   tokenizerFactory=solr.**KeywordTokenizerFactory/.
  
   Anyone as a clue what is different between q analysis behaviour and
 fq
   analysis behaviour?
  
   Thanks a lot
   Elisabeth
  
   2012/4/12 elisabeth benoit elisaelisael...@gmail.com
  
oh, that's right.
  
   thanks a lot,
   Elisabeth
  
  
   2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com
  
Elisabeth -
  
   As you described, below mapping might suit for your need.
   mairie = hotel de ville, mairie
  
   mairie gets expanded to hotel de ville and mairie at index
 time.
   So
   mairie and hotel de ville searchable on document.
  
   However, still white space tokenizer splits at query time will be
 a
   problem as described by Markus.
  
   --Jeevanandam
  
   On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
  
Have you tried the =' mapping instead? Something
like
hotel de ville = mairie
might work for you.
   
Yes, thanks, I've tried it but from what I undestand it doesn't
  solve
   my
problem, since this means hotel de ville will be replace by
 mairie
  at
index time (I use synonyms only at index time). So when user
 will
  ask
hôtel de ville, it won't match.
   
In fact, at index time I have mairie in my data, but I want user
  to be
   able
to request mairie or hôtel de ville and have mairie as
 answer,
  and
   not
have mairie as an answer when requesting hôtel.
   
   
To map `mairie` to `hotel de ville` as single token you must
  escape
   your
white
space.
   
mairie, hotel\ de\ ville
   
This results in  a problem if your tokenizer splits on white
  space
   at
query
time.
   
Ok, I guess this means I have a problem. No simple solution
 since
  at
   query
time my tokenizer do split on white spaces.
   
I guess my problem is more or less one of the problems
 discussed in
   
   
  
   http://lucene.472066.n3.**nabble.com/Multi-word-**
   synonyms-td3716292.html#**a3717215
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
  
   
   
Thanks a lot for your answers,
Elisabeth
   
   
   
   
   
2012/4/10 Erick Erickson erickerick...@gmail.com
   
Have you tried the =' mapping instead? Something
like
hotel de ville = mairie
might work for you.
   
Best
Erick
   
On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
Hello,
   
I've read several post on this issue, but can't find a real
  solution
   to
my
multi-words synonyms matching problem.
   
I have in my synonyms.txt an entry like
   
mairie, hotel de ville
   
and my index time analyzer is configured as followed for
  synonyms.
   
filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
ignoreCase=true expand=true/
   
The problem I have is that now mairie matches with hotel
 and
  I
   would
only want mairie to match with hotel de ville and
 mairie.
   
When I look into the analyzer, I see that mairie is mapped
 into
hotel,
and words de ville are added in second and third position.
 To
   change
that, I tried to do
   
filter class=solr.**SynonymFilterFactory
  synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I
 read in
   one
   post)
   
and I can see now in the analyzer that mairie is mapped to
  hotel
   de
ville, but now when I have query hotel de ville, it doesn't
  match
   at
all
with mairie.
   
Anyone has a clue of what I'm doing wrong?
   
I'm using Solr 3.4.
   
Thanks,
Elisabeth
   
  
  
  
  
  
 



Re: Multi-words synonyms matching

2012-04-25 Thread elisabeth benoit
.
  
  
   To map `mairie` to `hotel de ville` as single token you must
 escape
  your
   white
   space.
  
   mairie, hotel\ de\ ville
  
   This results in  a problem if your tokenizer splits on white
 space
  at
   query
   time.
  
   Ok, I guess this means I have a problem. No simple solution since
 at
  query
   time my tokenizer do split on white spaces.
  
   I guess my problem is more or less one of the problems discussed in
  
  
 
  http://lucene.472066.n3.**nabble.com/Multi-word-**
  synonyms-td3716292.html#**a3717215
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
  
  
   Thanks a lot for your answers,
   Elisabeth
  
  
  
  
  
   2012/4/10 Erick Erickson erickerick...@gmail.com
  
   Have you tried the =' mapping instead? Something
   like
   hotel de ville = mairie
   might work for you.
  
   Best
   Erick
  
   On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
   elisaelisael...@gmail.com wrote:
   Hello,
  
   I've read several post on this issue, but can't find a real
 solution
  to
   my
   multi-words synonyms matching problem.
  
   I have in my synonyms.txt an entry like
  
   mairie, hotel de ville
  
   and my index time analyzer is configured as followed for
 synonyms.
  
   filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true/
  
   The problem I have is that now mairie matches with hotel and
 I
  would
   only want mairie to match with hotel de ville and mairie.
  
   When I look into the analyzer, I see that mairie is mapped into
   hotel,
   and words de ville are added in second and third position. To
  change
   that, I tried to do
  
   filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true
   tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in
  one
  post)
  
   and I can see now in the analyzer that mairie is mapped to
 hotel
  de
   ville, but now when I have query hotel de ville, it doesn't
 match
  at
   all
   with mairie.
  
   Anyone has a clue of what I'm doing wrong?
  
   I'm using Solr 3.4.
  
   Thanks,
   Elisabeth
  
 
 
 
 
 



Re: Multi-words synonyms matching

2012-04-25 Thread Erick Erickson
   like
   hotel de ville = mairie
   might work for you.
  
   Yes, thanks, I've tried it but from what I undestand it doesn't
 solve
  my
   problem, since this means hotel de ville will be replace by mairie
 at
   index time (I use synonyms only at index time). So when user will
 ask
   hôtel de ville, it won't match.
  
   In fact, at index time I have mairie in my data, but I want user
 to be
  able
   to request mairie or hôtel de ville and have mairie as answer,
 and
  not
   have mairie as an answer when requesting hôtel.
  
  
   To map `mairie` to `hotel de ville` as single token you must
 escape
  your
   white
   space.
  
   mairie, hotel\ de\ ville
  
   This results in  a problem if your tokenizer splits on white
 space
  at
   query
   time.
  
   Ok, I guess this means I have a problem. No simple solution since
 at
  query
   time my tokenizer do split on white spaces.
  
   I guess my problem is more or less one of the problems discussed in
  
  
 
  http://lucene.472066.n3.**nabble.com/Multi-word-**
  synonyms-td3716292.html#**a3717215
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
  
  
   Thanks a lot for your answers,
   Elisabeth
  
  
  
  
  
   2012/4/10 Erick Erickson erickerick...@gmail.com
  
   Have you tried the =' mapping instead? Something
   like
   hotel de ville = mairie
   might work for you.
  
   Best
   Erick
  
   On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
   elisaelisael...@gmail.com wrote:
   Hello,
  
   I've read several post on this issue, but can't find a real
 solution
  to
   my
   multi-words synonyms matching problem.
  
   I have in my synonyms.txt an entry like
  
   mairie, hotel de ville
  
   and my index time analyzer is configured as followed for
 synonyms.
  
   filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true/
  
   The problem I have is that now mairie matches with hotel and
 I
  would
   only want mairie to match with hotel de ville and mairie.
  
   When I look into the analyzer, I see that mairie is mapped into
   hotel,
   and words de ville are added in second and third position. To
  change
   that, I tried to do
  
   filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true
   tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in
  one
  post)
  
   and I can see now in the analyzer that mairie is mapped to
 hotel
  de
   ville, but now when I have query hotel de ville, it doesn't
 match
  at
   all
   with mairie.
  
   Anyone has a clue of what I'm doing wrong?
  
   I'm using Solr 3.4.
  
   Thanks,
   Elisabeth
  
 
 
 
 
 



Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
Hello,

I'd like to resume this post.

The only way I found to do not split synonyms in words in synonyms.txt it
to use the line

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/

in schema.xml

where tokenizerFactory=solr.KeywordTokenizerFactory

instructs SynonymFilterFactory not to break synonyms into words on white
spaces when parsing synonyms file.

So now it works fine, mairie is mapped into hotel de ville and when I
send request q=hotel de ville (quotes are mandatory to prevent analyzer
to split hotel de ville on white spaces), I get answers with word mairie.

But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it
doesn't work!!!

CATEGORY_ANALYZED is same field type as default search field. This means
that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de
ville, solr uses the same analyzer, the one with the line

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/.

Anyone as a clue what is different between q analysis behaviour and fq
analysis behaviour?

Thanks a lot
Elisabeth

2012/4/12 elisabeth benoit elisaelisael...@gmail.com

 oh, that's right.

 thanks a lot,
 Elisabeth


 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com

 Elisabeth -

 As you described, below mapping might suit for your need.
 mairie = hotel de ville, mairie

 mairie gets expanded to hotel de ville and mairie at index time.  So
 mairie and hotel de ville searchable on document.

 However, still white space tokenizer splits at query time will be a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it doesn't solve my
  problem, since this means hotel de ville will be replace by mairie at
  index time (I use synonyms only at index time). So when user will ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want user to be
 able
  to request mairie or hôtel de ville and have mairie as answer, and
 not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must escape
 your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white space at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution since at
 query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems discussed in
 
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real solution
 to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for synonyms.
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel and I
 would
  only want mairie to match with hotel de ville and mairie.
 
  When I look into the analyzer, I see that mairie is mapped into
  hotel,
  and words de ville are added in second and third position. To change
  that, I tried to do
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one
 post)
 
  and I can see now in the analyzer that mairie is mapped to hotel de
  ville, but now when I have query hotel de ville, it doesn't match
 at
  all
  with mairie.
 
  Anyone has a clue of what I'm doing wrong?
 
  I'm using Solr 3.4.
 
  Thanks,
  Elisabeth
 





Re: Multi-words synonyms matching

2012-04-24 Thread Jeevanandam


usage of q and fq

q = is typically the main query for the search request

fq = is Filter Query; generally used to restrict the super set of 
documents without influencing score (more info. 
http://wiki.apache.org/solr/CommonQueryParameters#q)


For example:

q=hotel de ville === returns 100 documents

q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === 
returns 40 documents from super set of 100 documents



hope this helps!

- Jeevanandam


On 24-04-2012 3:08 pm, elisabeth benoit wrote:

Hello,

I'd like to resume this post.

The only way I found to do not split synonyms in words in 
synonyms.txt it

to use the line

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/

in schema.xml

where tokenizerFactory=solr.KeywordTokenizerFactory

instructs SynonymFilterFactory not to break synonyms into words on 
white

spaces when parsing synonyms file.

So now it works fine, mairie is mapped into hotel de ville and 
when I
send request q=hotel de ville (quotes are mandatory to prevent 
analyzer
to split hotel de ville on white spaces), I get answers with word 
mairie.


But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), 
it

doesn't work!!!

CATEGORY_ANALYZED is same field type as default search field. This 
means
that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel 
de

ville, solr uses the same analyzer, the one with the line

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/.

Anyone as a clue what is different between q analysis behaviour and 
fq

analysis behaviour?

Thanks a lot
Elisabeth

2012/4/12 elisabeth benoit elisaelisael...@gmail.com


oh, that's right.

thanks a lot,
Elisabeth


2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com


Elisabeth -

As you described, below mapping might suit for your need.
mairie = hotel de ville, mairie

mairie gets expanded to hotel de ville and mairie at index 
time.  So

mairie and hotel de ville searchable on document.

However, still white space tokenizer splits at query time will be a
problem as described by Markus.

--Jeevanandam

On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Yes, thanks, I've tried it but from what I undestand it doesn't 
solve my
 problem, since this means hotel de ville will be replace by 
mairie at
 index time (I use synonyms only at index time). So when user will 
ask

 hôtel de ville, it won't match.

 In fact, at index time I have mairie in my data, but I want user 
to be

able
 to request mairie or hôtel de ville and have mairie as 
answer, and

not
 have mairie as an answer when requesting hôtel.


 To map `mairie` to `hotel de ville` as single token you must 
escape

your
 white
 space.

 mairie, hotel\ de\ ville

 This results in  a problem if your tokenizer splits on white 
space at

 query
 time.

 Ok, I guess this means I have a problem. No simple solution since 
at

query
 time my tokenizer do split on white spaces.

 I guess my problem is more or less one of the problems discussed 
in




http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215


 Thanks a lot for your answers,
 Elisabeth





 2012/4/10 Erick Erickson erickerick...@gmail.com

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Best
 Erick

 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
 Hello,

 I've read several post on this issue, but can't find a real 
solution

to
 my
 multi-words synonyms matching problem.

 I have in my synonyms.txt an entry like

 mairie, hotel de ville

 and my index time analyzer is configured as followed for 
synonyms.


 filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt

 ignoreCase=true expand=true/

 The problem I have is that now mairie matches with hotel 
and I

would
 only want mairie to match with hotel de ville and mairie.

 When I look into the analyzer, I see that mairie is mapped 
into

 hotel,
 and words de ville are added in second and third position. To 
change

 that, I tried to do

 filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt

 ignoreCase=true expand=true
 tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in 
one

post)

 and I can see now in the analyzer that mairie is mapped to 
hotel de
 ville, but now when I have query hotel de ville, it doesn't 
match

at
 all
 with mairie.

 Anyone has a clue of what I'm doing wrong?

 I'm using Solr 3.4.

 Thanks,
 Elisabeth









Re: Multi-words synonyms matching

2012-04-24 Thread elisabeth benoit
yes, thanks, but this is NOT my question.

I was wondering why I have multiple matches with q=hotel de ville and no
match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm
searching in the same solr fieldType.

Why is q parameter behaving differently in that case? Why do the quotes
work in one case and not in the other?

Does anyone know?

Thanks,
Elisabeth

2012/4/24 Jeevanandam je...@myjeeva.com


 usage of q and fq

 q = is typically the main query for the search request

 fq = is Filter Query; generally used to restrict the super set of
 documents without influencing score (more info.
 http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q
 )

 For example:
 
 q=hotel de ville === returns 100 documents

 q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed ===
 returns 40 documents from super set of 100 documents


 hope this helps!

 - Jeevanandam



 On 24-04-2012 3:08 pm, elisabeth benoit wrote:

 Hello,

 I'd like to resume this post.

 The only way I found to do not split synonyms in words in synonyms.txt it
 to use the line

  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/

 in schema.xml

 where tokenizerFactory=solr.**KeywordTokenizerFactory

 instructs SynonymFilterFactory not to break synonyms into words on white
 spaces when parsing synonyms file.

 So now it works fine, mairie is mapped into hotel de ville and when I
 send request q=hotel de ville (quotes are mandatory to prevent analyzer
 to split hotel de ville on white spaces), I get answers with word
 mairie.

 But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it
 doesn't work!!!

 CATEGORY_ANALYZED is same field type as default search field. This means
 that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de
 ville, solr uses the same analyzer, the one with the line

 filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/.

 Anyone as a clue what is different between q analysis behaviour and fq
 analysis behaviour?

 Thanks a lot
 Elisabeth

 2012/4/12 elisabeth benoit elisaelisael...@gmail.com

  oh, that's right.

 thanks a lot,
 Elisabeth


 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com

  Elisabeth -

 As you described, below mapping might suit for your need.
 mairie = hotel de ville, mairie

 mairie gets expanded to hotel de ville and mairie at index time.  So
 mairie and hotel de ville searchable on document.

 However, still white space tokenizer splits at query time will be a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it doesn't solve
 my
  problem, since this means hotel de ville will be replace by mairie at
  index time (I use synonyms only at index time). So when user will ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want user to be
 able
  to request mairie or hôtel de ville and have mairie as answer, and
 not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must escape
 your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white space
 at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution since at
 query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems discussed in
 
 

 http://lucene.472066.n3.**nabble.com/Multi-word-**
 synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real solution
 to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for synonyms.
 
  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel and I
 would
  only want mairie to match with hotel de ville and mairie.
 
  When I look into the analyzer, I see that mairie is mapped into
  hotel,
  and words de ville are added in second and third position. To
 change
  that, I tried to do
 
  filter class=solr

Re: Multi-words synonyms matching

2012-04-24 Thread Erick Erickson
Elisabeth:

What shows up in the debug section of the response when you add
debugQuery=on? There should be some bit of that section like:
parsed_filter_queries

My other question is are you absolutely sure that your
CATEGORY_ANALYZED field has the correct content?. How does it
get populated?

Nothing jumps out at me here

Best
Erick

On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 yes, thanks, but this is NOT my question.

 I was wondering why I have multiple matches with q=hotel de ville and no
 match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm
 searching in the same solr fieldType.

 Why is q parameter behaving differently in that case? Why do the quotes
 work in one case and not in the other?

 Does anyone know?

 Thanks,
 Elisabeth

 2012/4/24 Jeevanandam je...@myjeeva.com


 usage of q and fq

 q = is typically the main query for the search request

 fq = is Filter Query; generally used to restrict the super set of
 documents without influencing score (more info.
 http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q
 )

 For example:
 
 q=hotel de ville === returns 100 documents

 q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed ===
 returns 40 documents from super set of 100 documents


 hope this helps!

 - Jeevanandam



 On 24-04-2012 3:08 pm, elisabeth benoit wrote:

 Hello,

 I'd like to resume this post.

 The only way I found to do not split synonyms in words in synonyms.txt it
 to use the line

  filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/

 in schema.xml

 where tokenizerFactory=solr.**KeywordTokenizerFactory

 instructs SynonymFilterFactory not to break synonyms into words on white
 spaces when parsing synonyms file.

 So now it works fine, mairie is mapped into hotel de ville and when I
 send request q=hotel de ville (quotes are mandatory to prevent analyzer
 to split hotel de ville on white spaces), I get answers with word
 mairie.

 But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it
 doesn't work!!!

 CATEGORY_ANALYZED is same field type as default search field. This means
 that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de
 ville, solr uses the same analyzer, the one with the line

 filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.**KeywordTokenizerFactory/.

 Anyone as a clue what is different between q analysis behaviour and fq
 analysis behaviour?

 Thanks a lot
 Elisabeth

 2012/4/12 elisabeth benoit elisaelisael...@gmail.com

  oh, that's right.

 thanks a lot,
 Elisabeth


 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com

  Elisabeth -

 As you described, below mapping might suit for your need.
 mairie = hotel de ville, mairie

 mairie gets expanded to hotel de ville and mairie at index time.  So
 mairie and hotel de ville searchable on document.

 However, still white space tokenizer splits at query time will be a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it doesn't solve
 my
  problem, since this means hotel de ville will be replace by mairie at
  index time (I use synonyms only at index time). So when user will ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want user to be
 able
  to request mairie or hôtel de ville and have mairie as answer, and
 not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must escape
 your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white space
 at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution since at
 query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems discussed in
 
 

 http://lucene.472066.n3.**nabble.com/Multi-word-**
 synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real solution
 to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for synonyms

Re: Multi-words synonyms matching

2012-04-12 Thread elisabeth benoit
oh, that's right.

thanks a lot,
Elisabeth

2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com

 Elisabeth -

 As you described, below mapping might suit for your need.
 mairie = hotel de ville, mairie

 mairie gets expanded to hotel de ville and mairie at index time.  So
 mairie and hotel de ville searchable on document.

 However, still white space tokenizer splits at query time will be a
 problem as described by Markus.

 --Jeevanandam

 On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Yes, thanks, I've tried it but from what I undestand it doesn't solve my
  problem, since this means hotel de ville will be replace by mairie at
  index time (I use synonyms only at index time). So when user will ask
  hôtel de ville, it won't match.
 
  In fact, at index time I have mairie in my data, but I want user to be
 able
  to request mairie or hôtel de ville and have mairie as answer, and
 not
  have mairie as an answer when requesting hôtel.
 
 
  To map `mairie` to `hotel de ville` as single token you must escape
 your
  white
  space.
 
  mairie, hotel\ de\ ville
 
  This results in  a problem if your tokenizer splits on white space at
  query
  time.
 
  Ok, I guess this means I have a problem. No simple solution since at
 query
  time my tokenizer do split on white spaces.
 
  I guess my problem is more or less one of the problems discussed in
 
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
  Thanks a lot for your answers,
  Elisabeth
 
 
 
 
 
  2012/4/10 Erick Erickson erickerick...@gmail.com
 
  Have you tried the =' mapping instead? Something
  like
  hotel de ville = mairie
  might work for you.
 
  Best
  Erick
 
  On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real solution to
  my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for synonyms.
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel and I
 would
  only want mairie to match with hotel de ville and mairie.
 
  When I look into the analyzer, I see that mairie is mapped into
  hotel,
  and words de ville are added in second and third position. To change
  that, I tried to do
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one
 post)
 
  and I can see now in the analyzer that mairie is mapped to hotel de
  ville, but now when I have query hotel de ville, it doesn't match at
  all
  with mairie.
 
  Anyone has a clue of what I'm doing wrong?
 
  I'm using Solr 3.4.
 
  Thanks,
  Elisabeth
 




Re: Multi-words synonyms matching

2012-04-11 Thread elisabeth benoit
Have you tried the =' mapping instead? Something
like
hotel de ville = mairie
might work for you.

Yes, thanks, I've tried it but from what I undestand it doesn't solve my
problem, since this means hotel de ville will be replace by mairie at
index time (I use synonyms only at index time). So when user will ask
hôtel de ville, it won't match.

In fact, at index time I have mairie in my data, but I want user to be able
to request mairie or hôtel de ville and have mairie as answer, and not
have mairie as an answer when requesting hôtel.


To map `mairie` to `hotel de ville` as single token you must escape your
white
space.

mairie, hotel\ de\ ville

This results in  a problem if your tokenizer splits on white space at
query
time.

Ok, I guess this means I have a problem. No simple solution since at query
time my tokenizer do split on white spaces.

I guess my problem is more or less one of the problems discussed in

http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215


Thanks a lot for your answers,
Elisabeth





2012/4/10 Erick Erickson erickerick...@gmail.com

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.

 Best
 Erick

 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hello,
 
  I've read several post on this issue, but can't find a real solution to
 my
  multi-words synonyms matching problem.
 
  I have in my synonyms.txt an entry like
 
  mairie, hotel de ville
 
  and my index time analyzer is configured as followed for synonyms.
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
 
  The problem I have is that now mairie matches with hotel and I would
  only want mairie to match with hotel de ville and mairie.
 
  When I look into the analyzer, I see that mairie is mapped into
 hotel,
  and words de ville are added in second and third position. To change
  that, I tried to do
 
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post)
 
  and I can see now in the analyzer that mairie is mapped to hotel de
  ville, but now when I have query hotel de ville, it doesn't match at
 all
  with mairie.
 
  Anyone has a clue of what I'm doing wrong?
 
  I'm using Solr 3.4.
 
  Thanks,
  Elisabeth



Re: Multi-words synonyms matching

2012-04-11 Thread Jeevanandam Madanagopal
Elisabeth -

As you described, below mapping might suit for your need.
mairie = hotel de ville, mairie

mairie gets expanded to hotel de ville and mairie at index time.  So 
mairie and hotel de ville searchable on document.

However, still white space tokenizer splits at query time will be a problem as 
described by Markus.

--Jeevanandam

On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:

 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.
 
 Yes, thanks, I've tried it but from what I undestand it doesn't solve my
 problem, since this means hotel de ville will be replace by mairie at
 index time (I use synonyms only at index time). So when user will ask
 hôtel de ville, it won't match.
 
 In fact, at index time I have mairie in my data, but I want user to be able
 to request mairie or hôtel de ville and have mairie as answer, and not
 have mairie as an answer when requesting hôtel.
 
 
 To map `mairie` to `hotel de ville` as single token you must escape your
 white
 space.
 
 mairie, hotel\ de\ ville
 
 This results in  a problem if your tokenizer splits on white space at
 query
 time.
 
 Ok, I guess this means I have a problem. No simple solution since at query
 time my tokenizer do split on white spaces.
 
 I guess my problem is more or less one of the problems discussed in
 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215
 
 
 Thanks a lot for your answers,
 Elisabeth
 
 
 
 
 
 2012/4/10 Erick Erickson erickerick...@gmail.com
 
 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.
 
 Best
 Erick
 
 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
 Hello,
 
 I've read several post on this issue, but can't find a real solution to
 my
 multi-words synonyms matching problem.
 
 I have in my synonyms.txt an entry like
 
 mairie, hotel de ville
 
 and my index time analyzer is configured as followed for synonyms.
 
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 
 The problem I have is that now mairie matches with hotel and I would
 only want mairie to match with hotel de ville and mairie.
 
 When I look into the analyzer, I see that mairie is mapped into
 hotel,
 and words de ville are added in second and third position. To change
 that, I tried to do
 
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post)
 
 and I can see now in the analyzer that mairie is mapped to hotel de
 ville, but now when I have query hotel de ville, it doesn't match at
 all
 with mairie.
 
 Anyone has a clue of what I'm doing wrong?
 
 I'm using Solr 3.4.
 
 Thanks,
 Elisabeth
 



Multi-words synonyms matching

2012-04-10 Thread elisabeth benoit
Hello,

I've read several post on this issue, but can't find a real solution to my
multi-words synonyms matching problem.

I have in my synonyms.txt an entry like

mairie, hotel de ville

and my index time analyzer is configured as followed for synonyms.

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/

The problem I have is that now mairie matches with hotel and I would
only want mairie to match with hotel de ville and mairie.

When I look into the analyzer, I see that mairie is mapped into hotel,
and words de ville are added in second and third position. To change
that, I tried to do

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true
tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post)

and I can see now in the analyzer that mairie is mapped to hotel de
ville, but now when I have query hotel de ville, it doesn't match at all
with mairie.

Anyone has a clue of what I'm doing wrong?

I'm using Solr 3.4.

Thanks,
Elisabeth


Re: Multi-words synonyms matching

2012-04-10 Thread Erick Erickson
Have you tried the =' mapping instead? Something
like
hotel de ville = mairie
might work for you.

Best
Erick

On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 Hello,

 I've read several post on this issue, but can't find a real solution to my
 multi-words synonyms matching problem.

 I have in my synonyms.txt an entry like

 mairie, hotel de ville

 and my index time analyzer is configured as followed for synonyms.

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/

 The problem I have is that now mairie matches with hotel and I would
 only want mairie to match with hotel de ville and mairie.

 When I look into the analyzer, I see that mairie is mapped into hotel,
 and words de ville are added in second and third position. To change
 that, I tried to do

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true
 tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post)

 and I can see now in the analyzer that mairie is mapped to hotel de
 ville, but now when I have query hotel de ville, it doesn't match at all
 with mairie.

 Anyone has a clue of what I'm doing wrong?

 I'm using Solr 3.4.

 Thanks,
 Elisabeth


Re: Multi-words synonyms matching

2012-04-10 Thread Markus Jelsma
To map `mairie` to `hotel de ville` as single token you must escape your white 
space.

mairie, hotel\ de\ ville

This results in  a problem if your tokenizer splits on white space at query 
time.

On Tuesday 10 April 2012 16:39:21 Erick Erickson wrote:
 Have you tried the =' mapping instead? Something
 like
 hotel de ville = mairie
 might work for you.
 
 Best
 Erick
 
 On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
 
 elisaelisael...@gmail.com wrote:
  Hello,
  
  I've read several post on this issue, but can't find a real solution to
  my multi-words synonyms matching problem.
  
  I have in my synonyms.txt an entry like
  
  mairie, hotel de ville
  
  and my index time analyzer is configured as followed for synonyms.
  
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  
  The problem I have is that now mairie matches with hotel and I would
  only want mairie to match with hotel de ville and mairie.
  
  When I look into the analyzer, I see that mairie is mapped into
  hotel, and words de ville are added in second and third position. To
  change that, I tried to do
  
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true
  tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post)
  
  and I can see now in the analyzer that mairie is mapped to hotel de
  ville, but now when I have query hotel de ville, it doesn't match at
  all with mairie.
  
  Anyone has a clue of what I'm doing wrong?
  
  I'm using Solr 3.4.
  
  Thanks,
  Elisabeth

-- 
Markus Jelsma - CTO - Openindex