[ 
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738818#comment-13738818
 ] 

Tom Burton-West commented on LUCENE-5175:
-----------------------------------------

I wondered about that "crazy cache", in that it makes the implementation 
dependent on the norms implementation.  

BTW: It looks to me with Lucene's default norms that there are only about 130 
or so "document lengths".  If there is no boosting going on the byte value has 
to get to 124 for a doclenth = 1, so there are only 255-124 =131 possible 
different lengths.

i=124 norm=1.0,doclen=1.0
                
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5175
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the 
> problem is documented.  There was a TODO note in BM25Similarity to add this 
> fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to