RE: commas in synonyms.txt are not escaping
Hah, I knew it was something simple. :) Thanks. Gary -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, August 28, 2011 12:50 PM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Turns out this isn't a bug - I was just tripped up by the analysis changes to the example server. Gary, you are probably just hitting the same thing. The "text" fieldType is no longer used by any fields by default - for example the "text" field uses the "text_general" fieldType. This fieldType uses the standard tokenizer, which discards stuff like commas (hence the synonym will never match). -Yonik http://www.lucidimagination.com
Re: commas in synonyms.txt are not escaping
Turns out this isn't a bug - I was just tripped up by the analysis changes to the example server. Gary, you are probably just hitting the same thing. The "text" fieldType is no longer used by any fields by default - for example the "text" field uses the "text_general" fieldType. This fieldType uses the standard tokenizer, which discards stuff like commas (hence the synonym will never match). -Yonik http://www.lucidimagination.com
RE: commas in synonyms.txt are not escaping
Alexi, Yes but no difference. This is apparently an issue introduced in 3.*. Thanks for your help. -Gary -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Friday, August 26, 2011 10:45 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Gary, isn't your wordDelimiter removing your commas in the query time? have u tried it in the analyzer? 2011/8/26 Moore, Gary > Here you go -- I'm just hacking the text field at the moment. Thanks, > Gary > > > > > synonyms="index_synonyms.txt" > tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true" > expand="true"/> > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > -Original Message- > From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] > Sent: Friday, August 26, 2011 10:30 AM > To: solr-user@lucene.apache.org > Subject: Re: commas in synonyms.txt are not escaping > > Gary, please post the entire field declaration so I can try to reproduce > here > > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
RE: commas in synonyms.txt are not escaping
Thanks, Yonik. Gary -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, August 26, 2011 11:25 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley wrote: > On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary wrote: >> >> I have a number of chemical names containing commas which I'm mapping in >> index_synonyms.txt thusly: >> >> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D >> 3,CCRIS 8562 >> >> According to the sample synonyms.txt, the comma above should be. i.e. >> a\,a=>b\,b. The problem is that according to analysis.jsp the commas are >> not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I >> paste in 2\,4-D-butotyl, the mappings are done. > > > I can confirm that this works in 1.4, but no longer works in 3x or > trunk. Can you open an issue? Actually, I think I've tracked it to LUCENE-3233 where the parsing rules were moved from Solr to Lucene (and changed the functionality in the process). I'll reopen t hat since I don't think it's been in a released version yet. -Yonik http://www.lucidimagination.com
Re: commas in synonyms.txt are not escaping
On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley wrote: > On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary wrote: >> >> I have a number of chemical names containing commas which I'm mapping in >> index_synonyms.txt thusly: >> >> 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D >> 3,CCRIS 8562 >> >> According to the sample synonyms.txt, the comma above should be. i.e. >> a\,a=>b\,b. The problem is that according to analysis.jsp the commas are >> not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I >> paste in 2\,4-D-butotyl, the mappings are done. > > > I can confirm that this works in 1.4, but no longer works in 3x or > trunk. Can you open an issue? Actually, I think I've tracked it to LUCENE-3233 where the parsing rules were moved from Solr to Lucene (and changed the functionality in the process). I'll reopen t hat since I don't think it's been in a released version yet. -Yonik http://www.lucidimagination.com
Re: commas in synonyms.txt are not escaping
On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary wrote: > > I have a number of chemical names containing commas which I'm mapping in > index_synonyms.txt thusly: > > 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D > 3,CCRIS 8562 > > According to the sample synonyms.txt, the comma above should be. i.e. > a\,a=>b\,b. The problem is that according to analysis.jsp the commas are > not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I > paste in 2\,4-D-butotyl, the mappings are done. I can confirm that this works in 1.4, but no longer works in 3x or trunk. Can you open an issue? -Yonik http://www.lucidimagination.com
Re: commas in synonyms.txt are not escaping
Gary, isn't your wordDelimiter removing your commas in the query time? have u tried it in the analyzer? 2011/8/26 Moore, Gary > Here you go -- I'm just hacking the text field at the moment. Thanks, > Gary > > > > > synonyms="index_synonyms.txt" > tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true" > expand="true"/> > >ignoreCase="true" >words="stopwords.txt" >enablePositionIncrements="true" >/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > -Original Message- > From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] > Sent: Friday, August 26, 2011 10:30 AM > To: solr-user@lucene.apache.org > Subject: Re: commas in synonyms.txt are not escaping > > Gary, please post the entire field declaration so I can try to reproduce > here > > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
RE: commas in synonyms.txt are not escaping
Here you go -- I'm just hacking the text field at the moment. Thanks, Gary -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Friday, August 26, 2011 10:30 AM To: solr-user@lucene.apache.org Subject: Re: commas in synonyms.txt are not escaping Gary, please post the entire field declaration so I can try to reproduce here
Re: commas in synonyms.txt are not escaping
Gary, please post the entire field declaration so I can try to reproduce here 2011/8/26 Moore, Gary > > I have a number of chemical names containing commas which I'm mapping in > index_synonyms.txt thusly: > > 2\,4-D-butotyl=>Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D > 3,CCRIS 8562 > > According to the sample synonyms.txt, the comma above should be. i.e. > a\,a=>b\,b.The problem is that according to analysis.jsp the commas are > not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I > paste in 2\,4-D-butotyl, the mappings are done. This is verified by there > being no mappings in the index. I assume there would be if 2\,4-D-butotyl > actually appeared in a document. > > The filter I'm declaring in the index analyzer looks like this: > > tokenizerFactory="solr.KeywordTokenizerFactory" ignoreCase="true" > expand="true"/> > > Doesn't seem to matter which tokenizer I use.This must be something > simple that I'm not doing but am a bit stumped at the moment and would > appreciate any tips. > Thanks > Gary > > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533