Re: a "fair" similarity

Fabrice Robini Mon, 21 Jan 2008 00:01:11 -0800

Hi,

I've tried this "fair" similarity with lucene 2.2 but it does not seems to
work.


I've attached the custom "MyFair" similarity to bith IndexWriter and
IndexSearcher.

Do you have any idea ?

Thanks a lot,

Fabrice


Daniel Naber-5 wrote:
> 
> Hi,
> 
> as some of you may have noticed, Lucene prefers shorter documents over 
> longer ones, i.e. shorter documents get a higher ranking, even if the 
> ratio "matched terms / total terms in document" is the same.
> 
> For example, take these two artificial documents:
> 
> doc1: x 2 3 4 5 6 7 8 9 10
> doc2: x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> 
> When searching for "x" doc1 will get a higher ranking, even though "x" 
> makes up 1/10 of the terms in both documents.
> 
> Using this similarity implementation seems to "fix" that:
> 
> class MySim extends DefaultSimilarity {
>  
>   public float lengthNorm(String fieldName, int numTerms) {
>     return (float)(1.0 / numTerms);
>   }
>   
>   public float tf(float freq) {
>     return (float)freq;
>   }
> 
> }
> 
> It's basically just the default implementation with Math.sqrt() removed.
> Is 
> this the correct approach? Are there any problems to expect? I just tested 
> it with the documents cited above.
> 
> The use case is that I want to boost fields, e.g. "body:foo^2 title:blah". 
> This could lead to strange results if title is already preferred just 
> because it's shorter.
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/a-%22fair%22-similarity-tp5806739p14992681.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: a "fair" similarity

Reply via email to