Re: Strip special chars like "-"
Erick, you're right. It's working, my schema looks like this: Thanks for helping me!! -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3248545.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
That's not what I get. This is for Solr 3.3, but there's no reason that I know of that other versions should give different results. Here's the field def form the 3.3 example, this is just the standard implementation. At index time, it produces the tokens for manchester-united pos 1 pos 2 manchester united manchesterunited at query time, manchesterunited matches, it isn't transformed and matches on the second row manchester united and manchester-united both parse to manchester united and match the first row. So somehow we're not doing the same thing. Try attaching &debugQuery=on to your query and post the results. Also try looking at the admin/analysis page and see what that tells you. Best Erick P.S. Did you re-index after your schema changes? On Tue, Aug 9, 2011 at 11:03 AM, roySolr wrote: > Ok, i there are three query possibilities: > > Manchester-united > Manchester united > Manchesterunited > > The original name of the club is "manchester-united". > > > generateWordParts will fixes two of these possibilities: > > "Manchester-united" => "manchester","united" > > I can search for "Manchester-united" and "manchester" "united". When i > search for "manchesterunited" i get no results. > > To fix this i could use catenateWords: > > "Manchester-united" => "manchesterunited" > > In this situation i can search for "Manchester-united" and > "manchesterunited". When i search for "manchester united" i get no results. > The catenateWords option will also fixes only 2 situations. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Strip special chars like "-"
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by "__" and passes the original token, the one word and two word versions through a SynonymFilter further down the chain (see Lucene in Action, 2nd Edition for code). -sujit On Tue, 2011-08-09 at 06:27 -0700, roySolr wrote: > Hello, > > I have some terms in my index with specials characters. An example is > "manchester-united". I want that a user can search for > "manchester-united","manchester united" and "manchesterunited". What's the > best way to fix this? i have used the patternReplaceFilter and some > tokenizers but it couldn't fix the last situation(manchesterunited). Can > someone helps me? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs: These parameters may be combined in any way. Example of generateWordParts="1" and catenateWords="1": "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot" (where 0,1,1 are token positions) does that fit the bill ? On 9 August 2011 16:03, roySolr wrote: > Ok, i there are three query possibilities: > > Manchester-united > Manchester united > Manchesterunited > > The original name of the club is "manchester-united". > > > generateWordParts will fixes two of these possibilities: > > "Manchester-united" => "manchester","united" > > I can search for "Manchester-united" and "manchester" "united". When i > search for "manchesterunited" i get no results. > > To fix this i could use catenateWords: > > "Manchester-united" => "manchesterunited" > > In this situation i can search for "Manchester-united" and > "manchesterunited". When i search for "manchester united" i get no results. > The catenateWords option will also fixes only 2 situations. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Strip special chars like "-"
Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is "manchester-united". generateWordParts will fixes two of these possibilities: "Manchester-united" => "manchester","united" I can search for "Manchester-united" and "manchester" "united". When i search for "manchesterunited" i get no results. To fix this i could use catenateWords: "Manchester-united" => "manchesterunited" In this situation i can search for "Manchester-united" and "manchesterunited". When i search for "manchester united" i get no results. The catenateWords option will also fixes only 2 situations. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
OK, what are the other possibilities that it doesn't fix? Just saying "it won't work" without some examples doesn't leave much to go on... Best Erick On Tue, Aug 9, 2011 at 10:41 AM, roySolr wrote: > Yes, i understand the difference between generateWordParts and catenateWords. > But i can't fix my problem with these options, It doesn't fix all the > possibilities. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Strip special chars like "-"
Yes, i understand the difference between generateWordParts and catenateWords. But i can't fix my problem with these options, It doesn't fix all the possibilities. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
catenateWordParts would club the two words as mentioned in the example @ http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory catenateWords="1" causes maximum runs of word parts to be catenated: "wi-fi" => "wifi" Regards, Jayendra On Tue, Aug 9, 2011 at 10:25 AM, roySolr wrote: > The catenateWordParts option has the following effect: > > manchester-united => "manchester","united" > > The query "manchesterunited" will not match with: "manchester","united". > Maybe i'm wrong but i have test something similar in the past. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239129.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Strip special chars like "-"
The catenateWordParts option has the following effect: manchester-united => "manchester","united" The query "manchesterunited" will not match with: "manchester","united". Maybe i'm wrong but i have test something similar in the past. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239129.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
Use the catenateWordParts option On Tuesday 09 August 2011 16:02:47 roySolr wrote: > With the worddelimiter i can only fix the first 2 > situations("manchester-united" and "manchester united") > > I can use something like generateWordParts. But i think this doesn't fix > the problem with "manchesterunited". > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239 > 056.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Strip special chars like "-"
With the worddelimiter i can only fix the first 2 situations("manchester-united" and "manchester united") I can use something like generateWordParts. But i think this doesn't fix the problem with "manchesterunited". -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like "-"
Use http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory which can generate tokens as u need to match the search patterns. Regards, Jayendra On Tue, Aug 9, 2011 at 9:27 AM, roySolr wrote: > Hello, > > I have some terms in my index with specials characters. An example is > "manchester-united". I want that a user can search for > "manchester-united","manchester united" and "manchesterunited". What's the > best way to fix this? i have used the patternReplaceFilter and some > tokenizers but it couldn't fix the last situation(manchesterunited). Can > someone helps me? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html > Sent from the Solr - User mailing list archive at Nabble.com. >