A useful relevance "feature" is the number of terms in a field in a
document.  Basically the term length discounted for overlaps, or the total
number of positions -- the position length.
org.apache.lucene.search.similarities.Similarity#computeNorm receives this
information, applies a Similarity-dependent formula, and the result is
stored into the norms disk format.  The Similarity API does not provide an
API to reverse this, even though it has the formulas to go one direction.
Wouldn't such an API be nice -- WDYT?  The ultimate goal would be to
provide a ValueSource for accessing.  There is something similar --
NormValueSource but that yields the decoded norm, not the term length (AKA
position length), and it's limited to TFIDFSimilarity.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Reply via email to