[ 
https://issues.apache.org/jira/browse/LUCENE-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259327#comment-16259327
 ] 

Adrien Grand commented on LUCENE-8053:
--------------------------------------

True, though it requires that the same token appears twice at the same 
position, which is usually not the case? Even though I agree similarities need 
to be able to deal with this case, I was more wondering wether some impls might 
degrade in quality if some terms have more occurrences than the document 
length. For the record, we used to round up before 7.0 since we actually 
rounded down {{1/sqrt(len)}}. Happy to close if we think this is a non-issue.

> Similarities should round the length up
> ---------------------------------------
>
>                 Key: LUCENE-8053
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8053
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Priority: Minor
>
> The encoding that we use for lengths currently rounds down in case the length 
> cannot be stored accurately. We should round up instead so that frequencies 
> can never be larger than the length.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to