[ https://issues.apache.org/jira/browse/LUCENE-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304061#comment-16304061 ]
Robert Muir commented on LUCENE-8087: ------------------------------------- {quote} I guess the only way to avoid recording similarity-specific information is to record all competitive (freq,norm) pairs for every block of X documents. X would likely need to be quite large since we would need to compute the score for every pair to know the best score in the block. {quote} Well again, i'm not sure it'd be so huge in practice. For omitTF+omitNorm fields, we dont have to write anything at all, its implicit. When either TF or norms are omitted, we only have to write a single value. In all other cases, its only "big terms", it already takes at least N (256?) docs for the term to even get skipdata at all. I agree it would be best at higher skip levels only though (your X). > Record per-term max term frequencies > ------------------------------------ > > Key: LUCENE-8087 > URL: https://issues.apache.org/jira/browse/LUCENE-8087 > Project: Lucene - Core > Issue Type: Wish > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-8087.patch > > > I was mostly interested in doing that in order to get better score upper > bounds for LUCENE-4100. However this doesn't help, at least with the tasks > that we have for wikimedium10m. I dug this a bit, and this is due to the fact > that the upper bound is not much better if we can't make assumptions about > the value of the length. Ideally we'd need something like the maximum term > frequency for each norm value. I'll post the patch in case someone has > another use-case for per-term max term frequencies. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org