[jira] [Commented] (LUCENE-8087) Record per-term max term frequencies

Robert Muir (JIRA) Tue, 26 Dec 2017 15:13:27 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304061#comment-16304061
 ]


Robert Muir commented on LUCENE-8087:
-------------------------------------

{quote}
I guess the only way to avoid recording similarity-specific information is to 
record all competitive (freq,norm) pairs for every block of X documents. X 
would likely need to be quite large since we would need to compute the score 
for every pair to know the best score in the block.
{quote}

Well again, i'm not sure it'd be so huge in practice. For omitTF+omitNorm 
fields, we dont have to write anything at all, its implicit. When either TF or 
norms are omitted, we only have to write a single value. In all other cases, 
its only "big terms", it already takes at least N (256?) docs for the term to 
even get skipdata at all. I agree it would be best at higher skip levels only 
though (your X).

> Record per-term max term frequencies
> ------------------------------------
>
>                 Key: LUCENE-8087
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8087
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8087.patch
>
>
> I was mostly interested in doing that in order to get better score upper 
> bounds for LUCENE-4100. However this doesn't help, at least with the tasks 
> that we have for wikimedium10m. I dug this a bit, and this is due to the fact 
> that the upper bound is not much better if we can't make assumptions about 
> the value of the length. Ideally we'd need something like the maximum term 
> frequency for each norm value. I'll post the patch in case someone has 
> another use-case for per-term max term frequencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8087) Record per-term max term frequencies

Reply via email to