RE: Custom lucene scoring - Dot product between field boost and query boost

Yuval Kesten Tue, 21 Feb 2012 07:28:29 -0800

The same question is formatted nicer here:
http://stackoverflow.com/questions/9380188/custom-lucene-scoring-dot-product-between-field-boost-and-query-boost

Thanks!

-----Original Message-----
From: Yuval Kesten [mailto:[email protected]] 
Sent: Tuesday, February 21, 2012 5:18 PM
To: [email protected]
Subject: Custom lucene scoring - Dot product between field boost and query boost

Hi,
I want to use Lucene with the following scoring logic:
When I index my documents I want to set for each field a score/weight.
When I query my index I want to set for each query term a score/weight.

I will NEVER index or query with many instances of the same field - In each 
query (document) there will be 0-1 instances with the same field name.
My fields/query term are not analyzed - they are already made out of one token.

I want the score to be simply the dot product between the fields of the query 
to the fields of the document if they have the same value.

For example:
Query:
Field Name

Field Value

Field Score

1

AA

0.1

7

BB

0.2

8

CC

0.3

Document 1:
Field Name

Field Value

Field Score

1

AA

0.2

2

DD

0.8

7

CC

0.999

10

FFF

0.1

Document 2:
Field Name

Field Value

Field Score

7

BB

0.3

8

CC

0.5

The scores should be:
Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2  = 0.02
Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * 
FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)

What would be the best way implement it? In terms of accuracy and performances 
(I don't need TF and IDF calculations).

I currently implemented it by setting boosts to the fields and query terms.
Then I overwritten the DefaultSimilarity class:

public class MySimilarity extends DefaultSimilarity {

    @Override
    public float computeNorm(String field, FieldInvertState state) {
        return state.getBoost();
    }

    @Override
    public float queryNorm(float sumOfSquaredWeights) {
        return 1;
    }

    @Override
    public float tf(float freq) {
        return 1;
    }

    @Override
    public float idf(int docFreq, int numDocs) {
        return 1;
    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        return 1;
    }

}

And based on 
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html 
this should work.
Problems:
1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing...
2. The score I get from the TopScoreDocCollector is not the same as I get from 
the Explanation.
Here is part of my code:

indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); 
TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, true); 
indexSearcher.search(query, collector); ScoreDoc[] hits = 
collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { int 
docId = hits[i].doc; Document d = indexSearcher.doc(docId); double score = 
hits[i].score; String id = d.get(FIELD_ID); Explanation explanation = 
indexSearcher.explain(query, docId); }

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Custom lucene scoring - Dot product between field boost and query boost

Reply via email to