
I have a question about the score produced by TFIDFSimilarity.
that the IDF factor should be squared in the final score. However, looking
at the code, I see this:

    public TFIDFScorer(float boost, Explanation idf, float[] normTable) {

      // TODO: Validate?

      this.idf = idf;

      this.boost = boost;

      this.queryWeight = boost * idf.getValue().floatValue();

      this.normTable = normTable;



    public float score(float freq, long norm) {

      final float raw = tf(freq) * queryWeight; // compute tf(f)*weight

      float normValue = normTable[(int) (norm & 0xFF)];

      return raw * normValue;  // normalize for field


Where does the second idf.getValue() factor come from? In Lucene 6.6.6,
before the patch for https://issues.apache.org/jira/browse/LUCENE-7368 was
applied, the code looked like this:

    TFIDFSimScorer(IDFStats stats, NumericDocValues norms) throws
IOException {

      this.stats = stats;

      this.weightValue = stats.value;

      this.norms = norms;



    public float score(int doc, float freq) {

      final float raw = tf(freq) * weightValue; // compute tf(f)*weight

      return norms == null ? raw : raw * decodeNormValue(norms.get(doc));  //
normalize for field



    public IDFStats(String field, Explanation idf) {

      // TODO: Validate?

      this.field = field;

      this.idf = idf;

      normalize(1f, 1f);



    public void normalize(float queryNorm, float boost) {

      this.boost = boost;

      this.queryNorm = queryNorm;

      queryWeight = queryNorm * boost * idf.getValue();

      value = queryWeight * idf.getValue();         // idf for document


Did we lose an idf.getValue() factor in this patch? Or was it moved
somewhere else? Could you please point me to the code location where the
score is multiplied by the IDF value a second time?


Reply via email to