Hi,
I've tried this "fair" similarity with lucene 2.2 but it does not seems to
work.
I've attached the custom "MyFair" similarity to bith IndexWriter and
IndexSearcher.
Do you have any idea ?
Thanks a lot,
Fabrice
Daniel Naber-5 wrote:
>
> Hi,
>
> as some of you may have noticed, Lucene prefers shorter documents over
> longer ones, i.e. shorter documents get a higher ranking, even if the
> ratio "matched terms / total terms in document" is the same.
>
> For example, take these two artificial documents:
>
> doc1: x 2 3 4 5 6 7 8 9 10
> doc2: x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>
> When searching for "x" doc1 will get a higher ranking, even though "x"
> makes up 1/10 of the terms in both documents.
>
> Using this similarity implementation seems to "fix" that:
>
> class MySim extends DefaultSimilarity {
>
> public float lengthNorm(String fieldName, int numTerms) {
> return (float)(1.0 / numTerms);
> }
>
> public float tf(float freq) {
> return (float)freq;
> }
>
> }
>
> It's basically just the default implementation with Math.sqrt() removed.
> Is
> this the correct approach? Are there any problems to expect? I just tested
> it with the documents cited above.
>
> The use case is that I want to boost fields, e.g. "body:foo^2 title:blah".
> This could lead to strange results if title is already preferred just
> because it's shorter.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/a-%22fair%22-similarity-tp5806739p14992681.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]