The same question is formatted nicer here: http://stackoverflow.com/questions/9380188/custom-lucene-scoring-dot-product-between-field-boost-and-query-boost
Thanks! -----Original Message----- From: Yuval Kesten [mailto:ykes...@yahoo-inc.com] Sent: Tuesday, February 21, 2012 5:18 PM To: java-user@lucene.apache.org Subject: Custom lucene scoring - Dot product between field boost and query boost Hi, I want to use Lucene with the following scoring logic: When I index my documents I want to set for each field a score/weight. When I query my index I want to set for each query term a score/weight. I will NEVER index or query with many instances of the same field - In each query (document) there will be 0-1 instances with the same field name. My fields/query term are not analyzed - they are already made out of one token. I want the score to be simply the dot product between the fields of the query to the fields of the document if they have the same value. For example: Query: Field Name Field Value Field Score 1 AA 0.1 7 BB 0.2 8 CC 0.3 Document 1: Field Name Field Value Field Score 1 AA 0.2 2 DD 0.8 7 CC 0.999 10 FFF 0.1 Document 2: Field Name Field Value Field Score 7 BB 0.3 8 CC 0.5 The scores should be: Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02 Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5) What would be the best way implement it? In terms of accuracy and performances (I don't need TF and IDF calculations). I currently implemented it by setting boosts to the fields and query terms. Then I overwritten the DefaultSimilarity class: public class MySimilarity extends DefaultSimilarity { @Override public float computeNorm(String field, FieldInvertState state) { return state.getBoost(); } @Override public float queryNorm(float sumOfSquaredWeights) { return 1; } @Override public float tf(float freq) { return 1; } @Override public float idf(int docFreq, int numDocs) { return 1; } @Override public float coord(int overlap, int maxOverlap) { return 1; } } And based on http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html this should work. Problems: 1. Performances: I am calculating all the TF/IDF stuff and NORMS for nothing... 2. The score I get from the TopScoreDocCollector is not the same as I get from the Explanation. Here is part of my code: indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, true); indexSearcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double score = hits[i].score; String id = d.get(FIELD_ID); Explanation explanation = indexSearcher.explain(query, docId); } Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org