Actually, I think all that's needed is a new ValueSource that gets the norm
and calls SmallFloat.byte4ToInt, which inverts Similarity.computeNorm
(which calls SmallFloat.intToByte4).  All the Similarity impls keep that
same implementation of computeNorm.  Admittedly I'm unsure about
NormValueSource's purpose and limitation to TFIDFSimilarity.

On Tue, Apr 1, 2025 at 9:03 PM David Smiley <dsmi...@apache.org> wrote:

> A useful relevance "feature" is the number of terms in a field in a
> document.  Basically the term length discounted for overlaps, or the total
> number of positions -- the position length.
> org.apache.lucene.search.similarities.Similarity#computeNorm receives this
> information, applies a Similarity-dependent formula, and the result is
> stored into the norms disk format.  The Similarity API does not provide an
> API to reverse this, even though it has the formulas to go one direction.
> Wouldn't such an API be nice -- WDYT?  The ultimate goal would be to
> provide a ValueSource for accessing.  There is something similar --
> NormValueSource but that yields the decoded norm, not the term length (AKA
> position length), and it's limited to TFIDFSimilarity.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>

Reply via email to