The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); }
For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number of terms in the span increases. So, for a simple phrase query (using spans), the longer the phrase, the lower the TF. For a simple SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1). I'm just wondering why this is the default behavior. For 'SpanTermQuery', I'd expect the TF to reflect the actual number of occurrences of the term. For a SpanNearQuery, wouldn't it still be the number of occurrences of the whole span, not the number of terms in the span? Thanks, Peter