Re: Reversing norms to the field length

David Smiley Wed, 02 Apr 2025 18:58:41 -0700

I did this: https://github.com/apache/lucene/pull/14433/files  but using
the newer LongValuesSource API not the older ValueSource API.


On Wed, Apr 2, 2025 at 3:12 PM David Smiley <dsmi...@apache.org> wrote:

> Actually, I think all that's needed is a new ValueSource that gets the
> norm and calls SmallFloat.byte4ToInt, which inverts Similarity.computeNorm
> (which calls SmallFloat.intToByte4).  All the Similarity impls keep that
> same implementation of computeNorm.  Admittedly I'm unsure about
> NormValueSource's purpose and limitation to TFIDFSimilarity.
>
> On Tue, Apr 1, 2025 at 9:03 PM David Smiley <dsmi...@apache.org> wrote:
>
>> A useful relevance "feature" is the number of terms in a field in a
>> document.  Basically the term length discounted for overlaps, or the total
>> number of positions -- the position length.
>> org.apache.lucene.search.similarities.Similarity#computeNorm receives this
>> information, applies a Similarity-dependent formula, and the result is
>> stored into the norms disk format.  The Similarity API does not provide an
>> API to reverse this, even though it has the formulas to go one direction.
>> Wouldn't such an API be nice -- WDYT?  The ultimate goal would be to
>> provide a ValueSource for accessing.  There is something similar --
>> NormValueSource but that yields the decoded norm, not the term length (AKA
>> position length), and it's limited to TFIDFSimilarity.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>

Re: Reversing norms to the field length

Reply via email to