Re: solr word delimiter
The worddelimiter filter is set to generatewordparts=1,generatenumberparts=1,catenatewords=1,catenatenumbers=1 both at index and querytime. Now i have this synonym mapping k-1 => k1 visa Here is the parsedquery_ToString +(text:"k (1 k) 1 visa"^0.8 | name:"k (1 k) 1 visa"^2.0)~0.01 (text:"k (1 k) 1 visa"~25^0.8 | name:"k (1 k) 1 visa"~25^2.0)~0.01 Why is solr grouping this way?k (1 k) 1 visa (i mean the 1k within brackets?) Also now after k-1 gets split by worddelimiter, does catenatewords=1 make k1 to be a single token? As far as with the matching, (text:"k (1 k) 1 visa"^0.8 documents that have k1 visa exact phrase would rank higher, docs with just k1 might rank next and since i have ps set to 25, would it also match docs that have 'k' and '1' within 25 words of one another? or k1 and visa within 25 words of one another because k1 is a single token? I seem to get confused with how solr matches documents in cases like this. Yonik Seeley wrote: > > On Jan 5, 2008 2:28 PM, anuvenk <[EMAIL PROTECTED]> wrote: >> Thats what i'm thinking too. If i remove solr.worddelimiter filter from >> both >> index and query, the word h1-b will remain as is in the index correct, so >> if >> someone searches for h1b (without hyphens) would it still return the h1-b >> doc. > > for "h1-b" to match "h1b", it will take either a synonym or something > like WordDelimiterFilter. > You can configure WordDelimiterFilter to only catenate too... so h1-b > would become h1b at both index and query time. The downside is that > it might catenate things you want. > > -Yonik > > -- View this message in context: http://www.nabble.com/solr-word-delimiter-tp14630435p14641602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr word delimiter
On Jan 5, 2008 2:28 PM, anuvenk <[EMAIL PROTECTED]> wrote: > Thats what i'm thinking too. If i remove solr.worddelimiter filter from both > index and query, the word h1-b will remain as is in the index correct, so if > someone searches for h1b (without hyphens) would it still return the h1-b > doc. for "h1-b" to match "h1b", it will take either a synonym or something like WordDelimiterFilter. You can configure WordDelimiterFilter to only catenate too... so h1-b would become h1b at both index and query time. The downside is that it might catenate things you want. -Yonik
Re: solr word delimiter
Thats what i'm thinking too. If i remove solr.worddelimiter filter from both index and query, the word h1-b will remain as is in the index correct, so if someone searches for h1b (without hyphens) would it still return the h1-b doc. Otis Gospodnetic wrote: > > It sounds like you simply want to drop solr.WordDelimiterFilterFactory > from your analyzer definition, no? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: anuvenk <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Saturday, January 5, 2008 1:24:14 AM > Subject: solr word delimiter > > > I have the word delimiter filter factory in the text field definition > both at > index and query time. > But it does have some negative effects on some search terms like h1-b > visa > It splits this in to three tokens h,1,b. Now if i understand right, > does > solr look for matches for 'h' separately, '1' separately and 'b' > separately > because they are three different tokens. This is giving some undesired > results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. > How > to solve this problem? > I tried adding synonym like h1-b => h1b visa > It does filter some results, but i'm trying to find a global solution > rather > adding synonyms for all kinds of immigration forms like i-94, k-1 etc > -- > View this message in context: > http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > -- View this message in context: http://www.nabble.com/solr-word-delimiter-tp14630435p14637863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr word delimiter
It sounds like you simply want to drop solr.WordDelimiterFilterFactory from your analyzer definition, no? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, January 5, 2008 1:24:14 AM Subject: solr word delimiter I have the word delimiter filter factory in the text field definition both at index and query time. But it does have some negative effects on some search terms like h1-b visa It splits this in to three tokens h,1,b. Now if i understand right, does solr look for matches for 'h' separately, '1' separately and 'b' separately because they are three different tokens. This is giving some undesired results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. How to solve this problem? I tried adding synonym like h1-b => h1b visa It does filter some results, but i'm trying to find a global solution rather adding synonyms for all kinds of immigration forms like i-94, k-1 etc -- View this message in context: http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html Sent from the Solr - User mailing list archive at Nabble.com.
solr word delimiter
I have the word delimiter filter factory in the text field definition both at index and query time. But it does have some negative effects on some search terms like h1-b visa It splits this in to three tokens h,1,b. Now if i understand right, does solr look for matches for 'h' separately, '1' separately and 'b' separately because they are three different tokens. This is giving some undesired results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. How to solve this problem? I tried adding synonym like h1-b => h1b visa It does filter some results, but i'm trying to find a global solution rather adding synonyms for all kinds of immigration forms like i-94, k-1 etc -- View this message in context: http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html Sent from the Solr - User mailing list archive at Nabble.com.