[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

Adrien Grand (Jira) Mon, 11 Jul 2022 05:53:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564988#comment-17564988
 ]


Adrien Grand commented on LUCENE-10650:
---------------------------------------

Hi Nathan. When we introduced dynamic pruning to Lucene, we also introduced the 
requirement that similarities produce scores that are non-decreasing when tf 
increases or when the length norm decreases (all other things equal). 
Unfortunately, this property could not be retained while keeping DFR 
similarities pluggable as they were so we removed support for the no after 
effect and only retained L and B.

It looks like this specific similarity that you are looking for could still be 
implemented in a way that scores are non-decreasing with increasing tf or 
decreasing norm, so you should be able to re-implement it using a scripted 
similarity for instance 
(https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#scripted_similarity)
 with something like below (untested):

{code}
    "similarity": {
      "my_dfr_sim": {
        "type": "scripted",
        "weight_script": {
          "source": "return query.boost * 
Math.log((field.docCount+1.0)/(term.docFreq+0.5)) / Math.log(2);"
        },
        "script": {
          "source": "return weight * doc.freq;"
        }
      }
    }
{code}

> "after_effect": "no" was removed what replaces it?
> --------------------------------------------------
>
>                 Key: LUCENE-10650
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10650
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Nathan Meisels
>            Priority: Major
>
> Hi!
> We have been using an old version of elasticsearch with the following 
> settings:
>  
> {code:java}
>         "default": {
>           "queryNorm": "1",
>           "type": "DFR",
>           "basic_model": "in",
>           "after_effect": "no",
>           "normalization": "no"
>         }{code}
>  
> I see [here|https://issues.apache.org/jira/browse/LUCENE-8015] that 
> "after_effect": "no" was removed.
> In 
> [old|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L33]
>  version score was:
> {code:java}
> return tfn * (float)(log2((N + 1) / (n + 0.5)));{code}
> In 
> [new|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.2/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java#L43]
>  version it's:
> {code:java}
> long N = stats.getNumberOfDocuments();
> long n = stats.getDocFreq();
> double A = log2((N + 1) / (n + 0.5));
> // basic model I should return A * tfn
> // which we rewrite to A * (1 + tfn) - A
> // so that it can be combined with the after effect while still guaranteeing
> // that the result is non-decreasing with tfn
> return A * aeTimes1pTfn * (1 - 1 / (1 + tfn));
> {code}
> I tried changing {color:#172b4d}after_effect{color} to "l" but the scoring is 
> different than what we are used to. (We depend heavily on the exact scoring).
> Do you have any advice how we can keep the same scoring as before?
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10650) "after_effect": "no" was removed what replaces it?

Reply via email to