Hello Folks,
               In Amazon product search we have a use case to override the
term-frequency to hold
a custom scoring signal for a small subset of fields in a document. These
fields do not have positions
enabled. The support for this was added to Lucene in
https://issues.apache.org/jira/browse/LUCENE-7854.

Following this change the *CheckIndex* tool no longer reports the total
token counts correctly on our index.
We have a simple 1-line change in our internal branch to increment total
positions count by 1 (instead of term-frequency)
if a field does not have positions.

*Current*:
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1609

*Proposed*: (hasPositions ? freq : 1);

If the community feels this is useful and something that should be changed
in Lucene then I am happy to open a JIRA and contribute a patch with
suitable unit test(s).

Thanks
-Ankur

Reply via email to