Re: Multi-words synonyms matching
The reason multi word synonyms work better if you use LUCENE_33 is because then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory (FSTSynonymFilterFactory). But I don't know if the difference between them is a bug or not. Maybe someone has more insight? Bernd Fehling-2 wrote Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Do you have test cases? What are you sending to your SynonymFilterFactory? What are you expecting it should return? What is it returning when setting to Version.LUCENE_33? What is it returning when setting to Version.LUCENE_36? Am 05.06.2012 10:56, schrieb O. Klein: The reason multi word synonyms work better if you use LUCENE_33 is because then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory (FSTSynonymFilterFactory). But I don't know if the difference between them is a bug or not. Maybe someone has more insight? Bernd Fehling-2 wrote Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one?
Re: Multi-words synonyms matching
Looking for some more background information I stumbled upon https://issues.apache.org/jira/browse/LUCENE-3668. If you read the last post it confirms my issue. So maybe this is a bug? Bernd Fehling-2 wrote Are you sure with LUCENE_33 (Use of BitVector)? Am 31.05.2012 17:20, schrieb O. Klein: I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
I have been struggling with this as well and found that using LUCENE_33 gives the best results. But as it will be deprecated this is no everlasting solution. May somebody knows one? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-words-synonyms-matching-tp3898950p3987048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net * -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Multi-words synonyms matching
tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net * -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Multi-words synonyms matching
elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- * Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net * -- * Bernd Fehling Universitätsbibliothek Bielefeld Dipl.-Inform. (FH) Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de 33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net * -- Lance Norskog goks...@gmail.com
Re: Multi-words synonyms matching
question is are you absolutely sure that your CATEGORY_ANALYZED field has the correct content?. How does it get populated? Nothing jumps out at me here Best Erick On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#q http://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured
Re: Multi-words synonyms matching
request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#q http://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- * Bernd FehlingUniversitätsbibliothek Bielefeld
Re: Multi-words synonyms matching
use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/CommonQueryParameters#q) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr
Re: Multi-words synonyms matching
Elisabeth: What shows up in the debug section of the response when you add debugQuery=on? There should be some bit of that section like: parsed_filter_queries My other question is are you absolutely sure that your CATEGORY_ANALYZED field has the correct content?. How does it get populated? Nothing jumps out at me here Best Erick On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms
Re: Multi-words synonyms matching
oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Multi-words synonyms matching
Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Multi-words synonyms matching
To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. On Tuesday 10 April 2012 16:39:21 Erick Erickson wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- Markus Jelsma - CTO - Openindex