Hi Em, 1. Regarding the performances - the similarity class (And my subtype as well) gets the IDF and TF and SQUARED SUMS calculations as inputs - they just factor them differently. Even though I ignore the values they are being computed. 2. I have written this code: static { Similarity.setDefault(new MySimilarity()); } Which means that I am setting the default similarity before doing the indexing and obviously before the searching. Thanks!
-----Original Message----- From: Em [mailto:mailformailingli...@yahoo.de] Sent: Tuesday, February 21, 2012 6:07 PM To: java-user@lucene.apache.org Subject: Re: Custom lucene scoring - Dot product between field boost and query boost Hi Yuval, > 1. Performances: I am calculating all the TF/IDF stuff and NORMS for > nothing... You aren't calculating that much, since you declared all those values as constants. What are you worried about? > 2. The score I get from the TopScoreDocCollector is not the same as I get from the Explanation. > Here is part of my code: Could you provide us the code where you are setting the Similarity, please? Kind regards, Em Am 21.02.2012 16:18, schrieb Yuval Kesten: > Hi, > I want to use Lucene with the following scoring logic: > When I index my documents I want to set for each field a score/weight. > When I query my index I want to set for each query term a score/weight. > > I will NEVER index or query with many instances of the same field - In each > query (document) there will be 0-1 instances with the same field name. > My fields/query term are not analyzed - they are already made out of one > token. > > I want the score to be simply the dot product between the fields of the query > to the fields of the document if they have the same value. > > For example: > Query: > Field Name > > Field Value > > Field Score > > 1 > > AA > > 0.1 > > 7 > > BB > > 0.2 > > 8 > > CC > > 0.3 > > > Document 1: > Field Name > > Field Value > > Field Score > > 1 > > AA > > 0.2 > > 2 > > DD > > 0.8 > > 7 > > CC > > 0.999 > > 10 > > FFF > > 0.1 > > > Document 2: > Field Name > > Field Value > > Field Score > > 7 > > BB > > 0.3 > > 8 > > CC > > 0.5 > > > The scores should be: > Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02 > Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * > FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5) > > What would be the best way implement it? In terms of accuracy and > performances (I don't need TF and IDF calculations). > > I currently implemented it by setting boosts to the fields and query terms. > Then I overwritten the DefaultSimilarity class: > > public class MySimilarity extends DefaultSimilarity { > > @Override > public float computeNorm(String field, FieldInvertState state) { > return state.getBoost(); > } > > @Override > public float queryNorm(float sumOfSquaredWeights) { > return 1; > } > > @Override > public float tf(float freq) { > return 1; > } > > @Override > public float idf(int docFreq, int numDocs) { > return 1; > } > > @Override > public float coord(int overlap, int maxOverlap) { > return 1; > } > > } > > And based on > http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html > this should work. > Problems: > 1. Performances: I am calculating all the TF/IDF stuff and NORMS for > nothing... > 2. The score I get from the TopScoreDocCollector is not the same as I get > from the Explanation. > Here is part of my code: > > indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); > TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, > true); indexSearcher.search(query, collector); ScoreDoc[] hits = > collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { > int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double > score = hits[i].score; String id = d.get(FIELD_ID); Explanation > explanation = indexSearcher.explain(query, docId); } > > Thanks! > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org