I did this: https://github.com/apache/lucene/pull/14433/files but using the newer LongValuesSource API not the older ValueSource API.
On Wed, Apr 2, 2025 at 3:12 PM David Smiley <dsmi...@apache.org> wrote: > Actually, I think all that's needed is a new ValueSource that gets the > norm and calls SmallFloat.byte4ToInt, which inverts Similarity.computeNorm > (which calls SmallFloat.intToByte4). All the Similarity impls keep that > same implementation of computeNorm. Admittedly I'm unsure about > NormValueSource's purpose and limitation to TFIDFSimilarity. > > On Tue, Apr 1, 2025 at 9:03 PM David Smiley <dsmi...@apache.org> wrote: > >> A useful relevance "feature" is the number of terms in a field in a >> document. Basically the term length discounted for overlaps, or the total >> number of positions -- the position length. >> org.apache.lucene.search.similarities.Similarity#computeNorm receives this >> information, applies a Similarity-dependent formula, and the result is >> stored into the norms disk format. The Similarity API does not provide an >> API to reverse this, even though it has the formulas to go one direction. >> Wouldn't such an API be nice -- WDYT? The ultimate goal would be to >> provide a ValueSource for accessing. There is something similar -- >> NormValueSource but that yields the decoded norm, not the term length (AKA >> position length), and it's limited to TFIDFSimilarity. >> >> ~ David Smiley >> Apache Lucene/Solr Search Developer >> http://www.linkedin.com/in/davidwsmiley >> >