The reason I asked about Span scoring is that the behavior changed when I switched from TermQuery to BoostingTermQuery to take advantage of payloads.
It seems to me that a SpanTermQuery and BoostingTermQuery should behave the same as TermQuery with respect to term frequency. The 'edit distance' isn't really relevant for these queries, is it? For a SpanNearQuery that contains SpanTermQueries, the score for a match on "the quick brown fox" would be lower than a match on "brown fox" because of the edit distance (4 vs 2). This seems counter intuitive, too. Any comments? Thanks, Peter On Tue, Mar 3, 2009 at 2:42 PM, Peter Keegan <peterlkee...@gmail.com> wrote: > The DefaultSimilarity class defines sloppyFreq as: > > public float sloppyFreq(int distance) { > return 1.0f / (distance + 1); > } > > For a 'SpanNearQuery', this reduces the effect of the term frequency on the > score as the number of terms in the span increases. So, for a simple phrase > query (using spans), the longer the phrase, the lower the TF. For a simple > SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1). > > I'm just wondering why this is the default behavior. For 'SpanTermQuery', > I'd expect the TF to reflect the actual number of occurrences of the term. > For a SpanNearQuery, wouldn't it still be the number of occurrences of the > whole span, not the number of terms in the span? > > Thanks, > Peter >