Re: solr word delimiter

2008-01-05 Thread anuvenk

The worddelimiter filter is set to
generatewordparts=1,generatenumberparts=1,catenatewords=1,catenatenumbers=1
both at index and querytime.

Now i have this synonym mapping k-1 => k1 visa

Here is the parsedquery_ToString

+(text:"k (1 k) 1 visa"^0.8 | name:"k (1 k) 1 visa"^2.0)~0.01 (text:"k (1 k)
1 visa"~25^0.8 | name:"k (1 k) 1 visa"~25^2.0)~0.01


Why is solr grouping this way?k (1 k) 1 visa (i mean the 1k within
brackets?)
Also now after k-1 gets split by worddelimiter, does catenatewords=1 make k1
to be a single token?

As far as with the matching, 
(text:"k (1 k) 1 visa"^0.8
documents that have k1 visa exact phrase would rank higher, docs with just
k1 might rank next 
and since i have ps set to 25, would it also match docs that have 'k' and
'1' within 25 words of one another? or k1 and visa within 25 words of one
another because k1 is a single token? I seem to get confused with how solr
matches documents in cases like this.





Yonik Seeley wrote:
> 
> On Jan 5, 2008 2:28 PM, anuvenk <[EMAIL PROTECTED]> wrote:
>> Thats what i'm thinking too. If i remove solr.worddelimiter filter from
>> both
>> index and query, the word h1-b will remain as is in the index correct, so
>> if
>> someone searches for h1b (without hyphens) would it still return the h1-b
>> doc.
> 
> for "h1-b" to match "h1b", it will take either a synonym or something
> like WordDelimiterFilter.
> You can configure WordDelimiterFilter to only catenate too... so h1-b
> would become h1b at both index and query time.  The downside is that
> it might catenate things you want.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-word-delimiter-tp14630435p14641602.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr word delimiter

2008-01-05 Thread Yonik Seeley
On Jan 5, 2008 2:28 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> Thats what i'm thinking too. If i remove solr.worddelimiter filter from both
> index and query, the word h1-b will remain as is in the index correct, so if
> someone searches for h1b (without hyphens) would it still return the h1-b
> doc.

for "h1-b" to match "h1b", it will take either a synonym or something
like WordDelimiterFilter.
You can configure WordDelimiterFilter to only catenate too... so h1-b
would become h1b at both index and query time.  The downside is that
it might catenate things you want.

-Yonik


Re: solr word delimiter

2008-01-05 Thread anuvenk

Thats what i'm thinking too. If i remove solr.worddelimiter filter from both
index and query, the word h1-b will remain as is in the index correct, so if
someone searches for h1b (without hyphens) would it still return the h1-b
doc. 

Otis Gospodnetic wrote:
> 
> It sounds like you simply want to drop solr.WordDelimiterFilterFactory
> from your analyzer definition, no?
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: anuvenk <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, January 5, 2008 1:24:14 AM
> Subject: solr word delimiter
> 
> 
> I have the word delimiter filter factory in the text field definition
>  both at
> index and query time. 
> But it does have some negative effects on some search terms like h1-b
>  visa
> It splits this in to three tokens h,1,b. Now if i understand right,
>  does
> solr look for matches for 'h' separately, '1' separately and 'b'
>  separately
> because they are three different tokens. This is giving some undesired
> results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere.
>  How
> to solve this problem?
> I tried adding synonym like h1-b => h1b visa
> It does filter some results, but i'm trying to find a global solution
>  rather
> adding synonyms for all kinds of immigration forms like i-94, k-1 etc
> -- 
> View this message in context:
>  http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-word-delimiter-tp14630435p14637863.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr word delimiter

2008-01-05 Thread Otis Gospodnetic
It sounds like you simply want to drop solr.WordDelimiterFilterFactory from 
your analyzer definition, no?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: anuvenk <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, January 5, 2008 1:24:14 AM
Subject: solr word delimiter


I have the word delimiter filter factory in the text field definition
 both at
index and query time. 
But it does have some negative effects on some search terms like h1-b
 visa
It splits this in to three tokens h,1,b. Now if i understand right,
 does
solr look for matches for 'h' separately, '1' separately and 'b'
 separately
because they are three different tokens. This is giving some undesired
results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere.
 How
to solve this problem?
I tried adding synonym like h1-b => h1b visa
It does filter some results, but i'm trying to find a global solution
 rather
adding synonyms for all kinds of immigration forms like i-94, k-1 etc
-- 
View this message in context:
 http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html
Sent from the Solr - User mailing list archive at Nabble.com.






solr word delimiter

2008-01-04 Thread anuvenk

I have the word delimiter filter factory in the text field definition both at
index and query time. 
But it does have some negative effects on some search terms like h1-b visa
It splits this in to three tokens h,1,b. Now if i understand right, does
solr look for matches for 'h' separately, '1' separately and 'b' separately
because they are three different tokens. This is giving some undesired
results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. How
to solve this problem?
I tried adding synonym like h1-b => h1b visa
It does filter some results, but i'm trying to find a global solution rather
adding synonyms for all kinds of immigration forms like i-94, k-1 etc
-- 
View this message in context: 
http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html
Sent from the Solr - User mailing list archive at Nabble.com.