[
https://issues.apache.org/jira/browse/LUCENE-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Ernst resolved LUCENE-5005.
--------------------------------
Resolution: Not A Problem
Lucene encodes norms using 8 bits. This means precision can be lost when
encoding. You can see it explained here:
https://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/DefaultSimilarity
> Length norm value of DefaultSimilarity for a few terms
> ------------------------------------------------------
>
> Key: LUCENE-5005
> URL: https://issues.apache.org/jira/browse/LUCENE-5005
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 4.0
> Reporter: Shingo Sasaki
> Priority: Minor
>
> lengthNorm method of DefaultSimilarity is following:
> {noformat}
> public float lengthNorm(FieldInvertState state) {
> final int numTerms;
> if (discountOverlaps)
> numTerms = state.getLength() - state.getNumOverlap();
> else
> numTerms = state.getLength();
> return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
> }
> {noformat}
> The retrun value is decided by (1.0 / Math.sqrt(numTerms)).
> The type is float, but this value is encoded to byte length by
> SmallFloat.floatToByte315.
> ||term count||1/sqrt(numTerms)||1/sqrt(numTerms) to byte||
> |1| 1.000000| 1.0000|
> |2| 0.707107| 0.6250|
> |3| 0.577350| 0.5000|
> |4| 0.500000| 0.5000|
> |5| 0.447214| 0.4375|
> The length norm of 3 terms is the same as that of 4 terms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]