Hi Yuval, > 1. Performances: I am calculating all the TF/IDF stuff and NORMS for > nothing... You aren't calculating that much, since you declared all those values as constants. What are you worried about?
> 2. The score I get from the TopScoreDocCollector is not the same as I get from the Explanation. > Here is part of my code: Could you provide us the code where you are setting the Similarity, please? Kind regards, Em Am 21.02.2012 16:18, schrieb Yuval Kesten: > Hi, > I want to use Lucene with the following scoring logic: > When I index my documents I want to set for each field a score/weight. > When I query my index I want to set for each query term a score/weight. > > I will NEVER index or query with many instances of the same field - In each > query (document) there will be 0-1 instances with the same field name. > My fields/query term are not analyzed - they are already made out of one > token. > > I want the score to be simply the dot product between the fields of the query > to the fields of the document if they have the same value. > > For example: > Query: > Field Name > > Field Value > > Field Score > > 1 > > AA > > 0.1 > > 7 > > BB > > 0.2 > > 8 > > CC > > 0.3 > > > Document 1: > Field Name > > Field Value > > Field Score > > 1 > > AA > > 0.2 > > 2 > > DD > > 0.8 > > 7 > > CC > > 0.999 > > 10 > > FFF > > 0.1 > > > Document 2: > Field Name > > Field Value > > Field Score > > 7 > > BB > > 0.3 > > 8 > > CC > > 0.5 > > > The scores should be: > Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02 > Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * > FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5) > > What would be the best way implement it? In terms of accuracy and > performances (I don't need TF and IDF calculations). > > I currently implemented it by setting boosts to the fields and query terms. > Then I overwritten the DefaultSimilarity class: > > public class MySimilarity extends DefaultSimilarity { > > @Override > public float computeNorm(String field, FieldInvertState state) { > return state.getBoost(); > } > > @Override > public float queryNorm(float sumOfSquaredWeights) { > return 1; > } > > @Override > public float tf(float freq) { > return 1; > } > > @Override > public float idf(int docFreq, int numDocs) { > return 1; > } > > @Override > public float coord(int overlap, int maxOverlap) { > return 1; > } > > } > > And based on > http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html > this should work. > Problems: > 1. Performances: I am calculating all the TF/IDF stuff and NORMS for > nothing... > 2. The score I get from the TopScoreDocCollector is not the same as I get > from the Explanation. > Here is part of my code: > > indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); > TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, true); > indexSearcher.search(query, collector); > ScoreDoc[] hits = collector.topDocs().scoreDocs; > for (int i = 0; i < hits.length; ++i) { > int docId = hits[i].doc; > Document d = indexSearcher.doc(docId); > double score = hits[i].score; > String id = d.get(FIELD_ID); > Explanation explanation = indexSearcher.explain(query, docId); > } > > Thanks! > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org