sloppyFreq question

Peter Keegan Tue, 03 Mar 2009 11:43:10 -0800

The DefaultSimilarity class defines sloppyFreq as:

public float sloppyFreq(int distance) {
  return 1.0f / (distance + 1);
}


For a 'SpanNearQuery', this reduces the effect of the term frequency on the
score as the number of terms in the span increases. So, for a simple phrase
query (using spans), the longer the phrase, the lower the TF. For a simple
SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1).

I'm just wondering why this is the default behavior. For 'SpanTermQuery',
I'd expect the TF to reflect the actual number of occurrences of the term.
For a SpanNearQuery, wouldn't it still be the number of occurrences of the
whole span, not the number of terms in the span?

Thanks,
Peter

sloppyFreq question

Reply via email to