Hi Yuval, You can just override Similarity, rather than DefaultSimilarity - that way you don't burn any CPU cycles on TF/IDF calculations.
Alan On 22 Feb 2012, at 07:17, Yuval Kesten wrote: > Hi Em, > 1. Regarding the performances - the similarity class (And my subtype as well) > gets the IDF and TF and SQUARED SUMS calculations as inputs - they just > factor them differently. Even though I ignore the values they are being > computed. > 2. I have written this code: > static { > Similarity.setDefault(new MySimilarity()); > } > Which means that I am setting the default similarity before doing the > indexing and obviously before the searching. > Thanks! > > -----Original Message----- > From: Em [mailto:mailformailingli...@yahoo.de] > Sent: Tuesday, February 21, 2012 6:07 PM > To: java-user@lucene.apache.org > Subject: Re: Custom lucene scoring - Dot product between field boost and > query boost > > Hi Yuval, > >> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for >> nothing... > You aren't calculating that much, since you declared all those values as > constants. What are you worried about? > >> 2. The score I get from the TopScoreDocCollector is not the same as I > get from the Explanation. >> Here is part of my code: > Could you provide us the code where you are setting the Similarity, please? > > Kind regards, > Em > > Am 21.02.2012 16:18, schrieb Yuval Kesten: >> Hi, >> I want to use Lucene with the following scoring logic: >> When I index my documents I want to set for each field a score/weight. >> When I query my index I want to set for each query term a score/weight. >> >> I will NEVER index or query with many instances of the same field - In each >> query (document) there will be 0-1 instances with the same field name. >> My fields/query term are not analyzed - they are already made out of one >> token. >> >> I want the score to be simply the dot product between the fields of the >> query to the fields of the document if they have the same value. >> >> For example: >> Query: >> Field Name >> >> Field Value >> >> Field Score >> >> 1 >> >> AA >> >> 0.1 >> >> 7 >> >> BB >> >> 0.2 >> >> 8 >> >> CC >> >> 0.3 >> >> >> Document 1: >> Field Name >> >> Field Value >> >> Field Score >> >> 1 >> >> AA >> >> 0.2 >> >> 2 >> >> DD >> >> 0.8 >> >> 7 >> >> CC >> >> 0.999 >> >> 10 >> >> FFF >> >> 0.1 >> >> >> Document 2: >> Field Name >> >> Field Value >> >> Field Score >> >> 7 >> >> BB >> >> 0.3 >> >> 8 >> >> CC >> >> 0.5 >> >> >> The scores should be: >> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02 >> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * >> FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5) >> >> What would be the best way implement it? In terms of accuracy and >> performances (I don't need TF and IDF calculations). >> >> I currently implemented it by setting boosts to the fields and query terms. >> Then I overwritten the DefaultSimilarity class: >> >> public class MySimilarity extends DefaultSimilarity { >> >> @Override >> public float computeNorm(String field, FieldInvertState state) { >> return state.getBoost(); >> } >> >> @Override >> public float queryNorm(float sumOfSquaredWeights) { >> return 1; >> } >> >> @Override >> public float tf(float freq) { >> return 1; >> } >> >> @Override >> public float idf(int docFreq, int numDocs) { >> return 1; >> } >> >> @Override >> public float coord(int overlap, int maxOverlap) { >> return 1; >> } >> >> } >> >> And based on >> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html >> this should work. >> Problems: >> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for >> nothing... >> 2. The score I get from the TopScoreDocCollector is not the same as I get >> from the Explanation. >> Here is part of my code: >> >> indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); >> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, >> true); indexSearcher.search(query, collector); ScoreDoc[] hits = >> collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { >> int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double >> score = hits[i].score; String id = d.get(FIELD_ID); Explanation >> explanation = indexSearcher.explain(query, docId); } >> >> Thanks! >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org