[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278354#comment-16278354
 ] 

Adrien Grand commented on LUCENE-8015:
--------------------------------------

I think our best option is to specialize some combinations. We should be able 
to specialize basic models G, IF, I(n) and I(ne) with after effects B, L and 
NoAfterEffect and make them pass tests. For instance, I tested out this 
specialization of model G and after effect L to make sure it actually passes 
the tests:

{code}
/** BasicModel G + AfterEffect L */
public class DFRSimilarityGL extends SimilarityBase {

  private final Normalization normalization;

  public DFRSimilarityGL(Normalization normalization) {
    this.normalization = Objects.requireNonNull(normalization);
  }

  @Override
  protected double score(BasicStats stats, double freq, double docLen) {
    double tfn = normalization.tfn(stats, freq, docLen);

    // approximation only holds true when F << N, so we use lambda = F / (N + F)
    double F = stats.getTotalTermFreq() + 1;
    double N = stats.getNumberOfDocuments();
    double lambda = F / (N + F);

    // -log(1 / (lambda + 1)) -> log(lambda + 1)
    double A = log2(lambda + 1);
    double B = log2((1 + lambda) / lambda);

    // basic model G uses (A + B * tfn)
    // after effect L takes the result and divides it by (1 + tfn)
    // so in the end we have (A + B * tfn) / (1 + tfn)
    // which we change to B - (B - A) / (1 + tfn) to reduce floating-point 
accuracy issues
    // (since tfn appears only once it is guaranteed to be non decreasing with 
tfn
    return B - (B - A) / (1 + tfn);
  }

  @Override
  public String toString() {
    return "DFR GL" + normalization.toString();
  }

}
{code}

> TestBasicModelIne.testRandomScoring failure
> -------------------------------------------
>
>                 Key: LUCENE-8015
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8015
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>         Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to