[ https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278354#comment-16278354 ]
Adrien Grand commented on LUCENE-8015: -------------------------------------- I think our best option is to specialize some combinations. We should be able to specialize basic models G, IF, I(n) and I(ne) with after effects B, L and NoAfterEffect and make them pass tests. For instance, I tested out this specialization of model G and after effect L to make sure it actually passes the tests: {code} /** BasicModel G + AfterEffect L */ public class DFRSimilarityGL extends SimilarityBase { private final Normalization normalization; public DFRSimilarityGL(Normalization normalization) { this.normalization = Objects.requireNonNull(normalization); } @Override protected double score(BasicStats stats, double freq, double docLen) { double tfn = normalization.tfn(stats, freq, docLen); // approximation only holds true when F << N, so we use lambda = F / (N + F) double F = stats.getTotalTermFreq() + 1; double N = stats.getNumberOfDocuments(); double lambda = F / (N + F); // -log(1 / (lambda + 1)) -> log(lambda + 1) double A = log2(lambda + 1); double B = log2((1 + lambda) / lambda); // basic model G uses (A + B * tfn) // after effect L takes the result and divides it by (1 + tfn) // so in the end we have (A + B * tfn) / (1 + tfn) // which we change to B - (B - A) / (1 + tfn) to reduce floating-point accuracy issues // (since tfn appears only once it is guaranteed to be non decreasing with tfn return B - (B - A) / (1 + tfn); } @Override public String toString() { return "DFR GL" + normalization.toString(); } } {code} > TestBasicModelIne.testRandomScoring failure > ------------------------------------------- > > Key: LUCENE-8015 > URL: https://issues.apache.org/jira/browse/LUCENE-8015 > Project: Lucene - Core > Issue Type: Task > Reporter: Adrien Grand > Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch > > > reproduce with: ant test -Dtestcase=TestBasicModelIne > -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 > -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu > -Dtests.asserts=true -Dtests.file.encoding=UTF8 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org