[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-06 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280612#comment-16280612
 ] 

Adrien Grand commented on LUCENE-8015:
--

Done, I combined both patches and beasting didn't find any failures so I 
merged. Thank you!

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Fix For: master (8.0)
>
> Attachments: LUCENE-8015-test.patch, LUCENE-8015.patch, 
> LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280590#comment-16280590
 ] 

ASF subversion and git services commented on LUCENE-8015:
-

Commit 63b63c573487fe6b054afb6073c057a88a15288f in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=63b63c5 ]

LUCENE-8015: Fixed DFR similarities' scores to not decrease when tfn increases.


> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015.patch, 
> LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280508#comment-16280508
 ] 

Robert Muir commented on LUCENE-8015:
-

Took a glance, I am good with this approach, thank you! I would like to combine 
your patch with my test patch (attached to this issue) though, because it makes 
the test much better for all sims not just this particular case by exercising 
the extremes explicitly.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015.patch, 
> LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-05 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278722#comment-16278722
 ] 

Adrien Grand commented on LUCENE-8015:
--

bq. Adrien it should reproduce every time with the test changes i made on this 
issue?

It doesn't because the fact we always compute scores as doubles then cast to a 
float hides the issue: even if score the score of Math.nextDown(freq) is more 
than the score of freq, the float cast rounds both values to the same float 
almost all the time.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278667#comment-16278667
 ] 

Robert Muir commented on LUCENE-8015:
-

I think I like the proposed solution. Lets drop NoAfterEffect though, i'm not 
sure its even theoretical: I don't see it in the DFR paper 
(http://theses.gla.ac.uk/1570/1/2003amatiphd.pdf). That would yield 8 solid 
combinations which seems manageable. There are also some "+1"'s that maybe are 
no longer necessary (I don't know if it makes this task easier, just mentioning 
it: LUCENE-8023) 

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278646#comment-16278646
 ] 

Robert Muir commented on LUCENE-8015:
-

Adrien it should reproduce every time with the test changes i made on this 
issue? Its just a bug in the test that it doesn't explicitly test the extremes 
but instead relies on randomness.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-05 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278354#comment-16278354
 ] 

Adrien Grand commented on LUCENE-8015:
--

I think our best option is to specialize some combinations. We should be able 
to specialize basic models G, IF, I(n) and I(ne) with after effects B, L and 
NoAfterEffect and make them pass tests. For instance, I tested out this 
specialization of model G and after effect L to make sure it actually passes 
the tests:

{code}
/** BasicModel G + AfterEffect L */
public class DFRSimilarityGL extends SimilarityBase {

  private final Normalization normalization;

  public DFRSimilarityGL(Normalization normalization) {
this.normalization = Objects.requireNonNull(normalization);
  }

  @Override
  protected double score(BasicStats stats, double freq, double docLen) {
double tfn = normalization.tfn(stats, freq, docLen);

// approximation only holds true when F << N, so we use lambda = F / (N + F)
double F = stats.getTotalTermFreq() + 1;
double N = stats.getNumberOfDocuments();
double lambda = F / (N + F);

// -log(1 / (lambda + 1)) -> log(lambda + 1)
double A = log2(lambda + 1);
double B = log2((1 + lambda) / lambda);

// basic model G uses (A + B * tfn)
// after effect L takes the result and divides it by (1 + tfn)
// so in the end we have (A + B * tfn) / (1 + tfn)
// which we change to B - (B - A) / (1 + tfn) to reduce floating-point 
accuracy issues
// (since tfn appears only once it is guaranteed to be non decreasing with 
tfn
return B - (B - A) / (1 + tfn);
  }

  @Override
  public String toString() {
return "DFR GL" + normalization.toString();
  }

}
{code}

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277651#comment-16277651
 ] 

Adrien Grand commented on LUCENE-8015:
--

I don't think we can fix this with a nextUp/nextDown? One way we could fix it 
for sure would be by implementing the basic model and the after effect in a 
single method. For instance {{(A + B * tfn) * (C / (tfn + 1))}} could be 
rewritten as {{(A - B + B * (1 + tfn))) * C / (tfn + 1) = (A - B) * C / (tfn + 
1) + B * C}}. Since there is only one occurrence of tfn in the latter, it would 
be guaranteed to be non-decreasing when tfn increases. Fixing it in the general 
case looks challenging however?

Maybe one reasonable way to avoid this issue would be to bound the values that 
tfn may take? This isn't nice but it wouldn't affect the general case, only 
when freq, avgdl, or some other stats have extreme values?

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277325#comment-16277325
 ] 

Robert Muir commented on LUCENE-8015:
-

thanks for the analysis! I still don't even really want to commit the "floor" 
modifications for In and Ine because i dont like it: really a scoring formula 
should be able to return a tiny tiny value for a stopword, that should be ok. 
It shouldnt have to be a number between 1 and 43 or whatever to work with 
lucene.

For model IF its justifiable, just like its justifiable in the BM25 case, 
because the formula is fundamentally broken you know, i mean we dont want 
negative scores for stopwords.

But your analysis suggests maybe we can look at a more surgical fix, like a 
nextUp/nextDown somewhere?

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-12-04 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277214#comment-16277214
 ] 

Adrien Grand commented on LUCENE-8015:
--

I have been looking into the following model G failure.

{noformat}
7.0E-45 = score(DFRSimilarity, doc=0, freq=0.9994), computed from:
  1.4E-45 = boost
  3.09640771E16 = NormalizationH1, computed from: 
0.9994 = tf
1.61490253E9 = avgFieldLength
112.0 = len
  9.2892231E16 = BasicModelG, computed from: 
12.0 = numberOfDocuments
1.0 = totalTermFreq
  4.8443234E-17 = AfterEffectB, computed from: 
3.09640771E16 = tfn
1.0 = totalTermFreq
1.0 = docFreq

5.6E-45 = score(DFRSimilarity, doc=0, freq=1.0), computed from:
  1.4E-45 = boost
  3.09640792E16 = NormalizationH1, computed from: 
1.0 = tf
1.61490253E9 = avgFieldLength
112.0 = len
  9.289224E16 = BasicModelG, computed from: 
12.0 = numberOfDocuments
1.0 = totalTermFreq
  4.844323E-17 = AfterEffectB, computed from: 
3.09640792E16 = tfn
1.0 = totalTermFreq
1.0 = docFreq

DFR GB1
field="field",maxDoc=46519,docCount=12,sumTotalTermFreq=19378830951,sumDocFreq=19378830951
term="term",docFreq=1,totalTermFreq=1
norm=59 (doc length ~ 112)
freq=1.0
NOTE: reproduce with: ant test  -Dtestcase=TestBasicModelG 
-Dtests.method=testRandomScoring -Dtests.seed=3C22B051C61EEC84 
-Dtests.locale=cs-CZ -Dtests.timezone=Atlantic/Madeira -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
{noformat}

In short, the scoring formula here looks like {{(A + B * tfn) * (C / (tfn + 
1))}} where A, B and C are constants. This function increases when tfn 
increases when B > A, which is always the case. The problem is that tfn is so 
large (ulp(tfn) = 4) , that {{tfn+1}} always returns {{tfn}} and {{A + B * 
tfn}} always returns the same as {{B * tfn}}. So when tfn gets high, the 
formula is effectively {{(B * tfn) * (C / tfn)}}. This is a constant, but since 
we compute the left and right parts independently, this might decrease when tfn 
increases about half the time.

Even though I triggered it with BasicModelG, I suspect it affects almost all 
DFRSimilarity impls.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
> Attachments: LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-11-02 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235668#comment-16235668
 ] 

Adrien Grand commented on LUCENE-8015:
--

+1 to giving it a try

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Major
> Attachments: LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233648#comment-16233648
 ] 

Robert Muir commented on LUCENE-8015:
-

I dug into the I\(n) and I\(ne) failures here via the new test, their biggest 
problem is in the BasicModel itself.

These idf-like functions have the "log1p" trap due to the formulas in use. Note 
their formula is {{log2((maxDoc + 1) / (x + 0.5))}} where x is docFreq for 
I\(n), expected docFreq for I\(ne), and totalTermFreq for I\(F). So the worst 
case (e.g. term in every doc) gets even worse as collection size increases, 
because we take log of values increasingly closer to 1.

BasicModel I\(F) never fails because we added a floor in its log: we had to, 
since it would otherwise go negative when totalTermFreq exceeds maxDoc, which 
can easily happen. But we should fix the other two in the same way, I think. It 
does not change retrieval quality in my tests.

If I floor them to avoid this issue like this, it fixes all their fails here 
and they survive hundred rounds of beasting by my new test:
{noformat}
--- 
a/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java
+++ 
b/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java
@@ -33,7 +33,7 @@ public class BasicModelIn extends BasicModel {
   public final double score(BasicStats stats, double tfn) {
 long N = stats.getNumberOfDocuments();
 long n = stats.getDocFreq();
-return tfn * log2((N + 1) / (n + 0.5));
+return tfn * log2(1 + (N + 1) / (n + 0.5));
   }
   
   @Override
--- 
a/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIne.java
+++ 
b/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIne.java
@@ -34,7 +34,7 @@ public class BasicModelIne extends BasicModel {
 long N = stats.getNumberOfDocuments();
 long F = stats.getTotalTermFreq();
 double ne = N * (1 - Math.pow((N - 1) / (double)N, F));
-return tfn * log2((N + 1) / (ne + 0.5));
+return tfn * log2(1 + (N + 1) / (ne + 0.5));
   }
{noformat}

Model G failures are separate, I have not looked into it yet.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Major
> Attachments: LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233529#comment-16233529
 ] 

Robert Muir commented on LUCENE-8015:
-

I tested your last failure of GL2 (#4) and its also covered by adrien's fix.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Major
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233522#comment-16233522
 ] 

Robert Muir commented on LUCENE-8015:
-

Thanks Steve: I am not sure if the 3 failures represent just one bug, but its 
very relevant.

Adrien's suggested fix alone will fix #1 and #3 but not #2. #2 is very clearly 
the hazard in AfterEffectB that I described (you can see it from the explain). 
If you combine both of our suggested fixes, all 3 cases will pass.

We should first maybe make the test more efficient.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Major
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-31 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16230656#comment-16230656
 ] 

Steve Rowe commented on LUCENE-8015:


Another reproducing failure, from my Jenkins; it's a different test suite, but 
looks similar enough to me to comment on this issue:

{noformat}
Checking out Revision b44497fdb721fb67c3c8f20dd0a781f6beaaa8a6 
(refs/remotes/origin/master)
[...]
   [junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelG
   [junit4]   1> 5.9448525E9 = score(DFRSimilarity, doc=0, freq=0.9994), 
computed from:
   [junit4]   1>   1.98161741E9 = boost
   [junit4]   1>   1.49336593E16 = NormalizationH1, computed from: 
   [junit4]   1> 0.9994 = tf
   [junit4]   1> 1.05701216E9 = avgFieldLength
   [junit4]   1> 152.0 = len
   [junit4]   1>   4.4800976E16 = BasicModelG, computed from: 
   [junit4]   1> 12.0 = numberOfDocuments
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1>   6.6962825E-17 = AfterEffectL, computed from: 
   [junit4]   1> 1.49336593E16 = tfn
   [junit4]   1> 
   [junit4]   1> 5.944852E9 = score(DFRSimilarity, doc=0, freq=1.0), computed 
from:
   [junit4]   1>   1.98161741E9 = boost
   [junit4]   1>   1.49336603E16 = NormalizationH1, computed from: 
   [junit4]   1> 1.0 = tf
   [junit4]   1> 1.05701216E9 = avgFieldLength
   [junit4]   1> 152.0 = len
   [junit4]   1>   4.480098E16 = BasicModelG, computed from: 
   [junit4]   1> 12.0 = numberOfDocuments
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1>   6.696282E-17 = AfterEffectL, computed from: 
   [junit4]   1> 1.49336603E16 = tfn
   [junit4]   1> 
   [junit4]   1> DFR GL1
   [junit4]   1> 
field="field",maxDoc=50,docCount=12,sumTotalTermFreq=12684145308,sumDocFreq=12
   [junit4]   1> term="term",docFreq=1,totalTermFreq=1
   [junit4]   1> norm=64 (doc length ~ 152)
   [junit4]   1> freq=1.0
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBasicModelG 
-Dtests.method=testRandomScoring -Dtests.seed=4B5C3926B202A201 
-Dtests.slow=true -Dtests.locale=en-IE -Dtests.timezone=Pacific/Bougainville 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 7.31s J0 | TestBasicModelG.testRandomScoring <<<
   [junit4]> Throwable #1: java.lang.AssertionError: 
score(0.9994)=5.9448525E9 > score(1.0)=5.944852E9
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([4B5C3926B202A201:C0C36094A875440B]:0)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
{field=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, 
maxPointsInLeafNode=68, maxMBSortInHeap=6.052983739984725, 
sim=RandomSimilarity(queryNorm=false): {field=DFR I(ne)B3(800.0)}, 
locale=en-IE, timezone=Pacific/Bougainville
   [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 
1.8.0_77 (64-bit)/cpus=16,threads=1,free=293394976,total=351797248
{noformat}

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Major
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-31 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227811#comment-16227811
 ] 

Steve Rowe commented on LUCENE-8015:


Two reproducing Jenkins failures: first, from 
[https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20789/]:

{noformat}
Checking out Revision 39376cd8b5ef03b3338c2e8fa31dce732749bcd7 
(refs/remotes/origin/master)
[...]
   [junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIn
   [junit4]   1> 1.27634828E18 = score(DFRSimilarity, doc=0, 
freq=1.18869171E9), computed from:
   [junit4]   1>   2.14748365E9 = boost
   [junit4]   1>   1.18869171E9 = NormalizationZ, computed from: 
   [junit4]   1> 1.18869171E9 = tf
   [junit4]   1> 6.0234362E8 = avgFieldLength
   [junit4]   1> 76.0 = len
   [junit4]   1>   1.18869171E9 = BasicModelIn, computed from: 
   [junit4]   1> 2.0 = numberOfDocuments
   [junit4]   1> 1.0 = docFreq
   [junit4]   1>   0.5006 = AfterEffectB, computed from: 
   [junit4]   1> 1.18869171E9 = tfn
   [junit4]   1> 1.18869184E9 = totalTermFreq
   [junit4]   1> 1.0 = docFreq
   [junit4]   1> 
   [junit4]   1> 1.27634814E18 = score(DFRSimilarity, doc=0, 
freq=1.18869184E9), computed from:
   [junit4]   1>   2.14748365E9 = boost
   [junit4]   1>   1.18869184E9 = NormalizationZ, computed from: 
   [junit4]   1> 1.18869184E9 = tf
   [junit4]   1> 6.0234362E8 = avgFieldLength
   [junit4]   1> 76.0 = len
   [junit4]   1>   1.18869184E9 = BasicModelIn, computed from: 
   [junit4]   1> 2.0 = numberOfDocuments
   [junit4]   1> 1.0 = docFreq
   [junit4]   1>   0.5 = AfterEffectB, computed from: 
   [junit4]   1> 1.18869184E9 = tfn
   [junit4]   1> 1.18869184E9 = totalTermFreq
   [junit4]   1> 1.0 = docFreq
   [junit4]   1> 
   [junit4]   1> DFR I(n)BZ(1.4E-45)
   [junit4]   1> 
field="field",maxDoc=2,docCount=2,sumTotalTermFreq=1204687257,sumDocFreq=2
   [junit4]   1> term="term",docFreq=1,totalTermFreq=1188691903
   [junit4]   1> norm=53 (doc length ~ 76)
   [junit4]   1> freq=1.18869184E9
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBasicModelIn 
-Dtests.method=testRandomScoring -Dtests.seed=4210BC5FDD9E3841 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=pt -Dtests.timezone=AET 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 6.16s J1 | TestBasicModelIn.testRandomScoring <<<
   [junit4]> Throwable #1: java.lang.AssertionError: 
score(1.18869171E9)=1.27634828E18 > score(1.18869184E9)=1.27634814E18
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([4210BC5FDD9E3841:C98FE5EDC7E9DE4B]:0)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
   [junit4]>at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
{field=PostingsFormat(name=LuceneVarGapDocFreqInterval)}, docValues:{}, 
maxPointsInLeafNode=839, maxMBSortInHeap=6.659456353481144, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@19821e9),
 locale=pt, timezone=AET
   [junit4]   2> NOTE: Linux 4.10.0-37-generic i386/Oracle Corporation 
1.8.0_144 (32-bit)/cpus=8,threads=1,free=227959720,total=316669952
{noformat}

And from [https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20744/] 
(output below is from my reproduction, since the job output is no longer 
accessible - git sha is 95d287e):

{noformat}
   [junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIn
   [junit4]   1> 8.0517238E9 = score(DFRSimilarity, doc=0, freq=1.86950656E9), 
computed from:
   [junit4]   1>   1.6103447E9 = boost
   [junit4]   1>   2.6727952E22 = NormalizationH1, computed from: 
   [junit4]   1> 1.86950656E9 = tf
   [junit4]   1> 1.4181463E9 = avgFieldLength
   [junit4]   1> 213016.0 = len
   [junit4]   1>   1.3363976E23 = BasicModelIn, computed from: 
   [junit4]   1> 79.0 = numberOfDocuments
   [junit4]   1> 2.0 = docFreq
   [junit4]   1>   3.7414016E-23 = AfterEffectL, computed from: 
   [junit4]   1> 2.6727952E22 = tfn
   [junit4]   1> 
   [junit4]   1> 8.0517233E9 = score(DFRSimilarity, doc=0, freq=1.86950669E9), 
computed from:
   [junit4]   1>   1.6103447E9 = boost
   [junit4]   1>   2.6727954E22 = NormalizationH1, computed from: 
   [junit4]   1> 1.86950669E9 = tf
   [junit4]   1> 1.4181463E9 = avgFieldLength
   [junit4]   1> 213016.0 = len
   [junit4]   1>   1.3363977E23 = BasicModelIn, computed from: 
   [junit4]   1> 79.0 = numberOfDocuments
   [junit4]   1> 2.0 = docFreq
   [junit4]   1>   3.7414013E-23 = AfterEffectL, computed from: 
   [junit4]   1> 2.6727954E22 = tfn
   [junit4]   1> 
   [jun

[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223553#comment-16223553
 ] 

Robert Muir commented on LUCENE-8015:
-

Maybe the issue is better fixed in after-effect B? Instead of {{(F+1)/(n * (tf 
+ 1))}} we can do {{(F+1)/n * 1/(tf+1)}}. Keep in mind F and n are presumably 
large, as they are the term's totalTermFreq and docFreq although not in this 
particular failure.

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220498#comment-16220498
 ] 

Robert Muir commented on LUCENE-8015:
-

Thanks I will look in. Its hard to debug it specifically without fixing 
explains first (we really need to do that, then you can "see" what goes wrong 
from test fails like this). Separately the test is inefficient in that this 
only comes out with beasting many iterations. We should improve the test to 
more often enumerate edges (e.g. min/max values) that look like this so that 
its more efficient.

at a glance it looks like small collection with mostly super-huge docs but then 
one tiny doc. So it may stress some extremes in computations like {{dl/avgdl}} 
type stuff, and expose a hazard in one of the components here. I have to look 
more...

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220422#comment-16220422
 ] 

Adrien Grand commented on LUCENE-8015:
--

{noformat}
   [junit4]  says 你好! Master seed: 86E85958B1183E93
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(22203@localhost).
   [junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIne
   [junit4]   1> 7.0E-45 = score(DFRSimilarity, doc=0, freq=0.9994), 
computed from:
   [junit4]   1>   1.4E-45 = boost
   [junit4]   1>   3.46341352E16 = NormalizationH1, computed from: 
   [junit4]   1> 0.9994 = tf
   [junit4]   1> 2.09433728E9 = avgFieldLength
   [junit4]   1> 26.0 = len
   [junit4]   1>   1.03902406E17 = BasicModelIne, computed from: 
   [junit4]   1> 11.0 = numberOfDocuments
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1>   4.3309873E-17 = AfterEffectB, computed from: 
   [junit4]   1> 3.46341352E16 = tfn
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1> 1.0 = docFreq
   [junit4]   1> 
   [junit4]   1> 5.6E-45 = score(DFRSimilarity, doc=0, freq=1.0), computed from:
   [junit4]   1>   1.4E-45 = boost
   [junit4]   1>   3.46341374E16 = NormalizationH1, computed from: 
   [junit4]   1> 1.0 = tf
   [junit4]   1> 2.09433728E9 = avgFieldLength
   [junit4]   1> 26.0 = len
   [junit4]   1>   1.03902414E17 = BasicModelIne, computed from: 
   [junit4]   1> 11.0 = numberOfDocuments
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1>   4.330987E-17 = AfterEffectB, computed from: 
   [junit4]   1> 3.46341374E16 = tfn
   [junit4]   1> 1.0 = totalTermFreq
   [junit4]   1> 1.0 = docFreq
   [junit4]   1> 
   [junit4]   1> DFR I(ne)B1
   [junit4]   1> 
field="field",maxDoc=16,docCount=11,sumTotalTermFreq=23037710092,sumDocFreq=1421016222
   [junit4]   1> term="term",docFreq=1,totalTermFreq=1
   [junit4]   1> norm=26 (doc length ~ 26)
   [junit4]   1> freq=1.0
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBasicModelIne 
-Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
-Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
-Dtests.asserts=true -Dtests.file.encoding=UTF8
   [junit4] FAILURE 3.13s | TestBasicModelIne.testRandomScoring <<<
   [junit4]> Throwable #1: java.lang.AssertionError: 
score(0.9994)=7.0E-45 > score(1.0)=5.6E-45
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([86E85958B1183E93:D7700EAAB6FD899]:0)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
   [junit4]>at 
org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
{field=PostingsFormat(name=Memory)}, docValues:{}, maxPointsInLeafNode=285, 
maxMBSortInHeap=6.307483399953041, sim=RandomSimilarity(queryNorm=false): 
{field=IB LL-DZ(0.3)}, locale=vi-VN, timezone=Pacific/Tongatapu
   [junit4]   2> NOTE: Linux 4.4.0-97-generic amd64/Oracle Corporation 
1.8.0_102 (64-bit)/cpus=8,threads=1,free=241459528,total=344457216
   [junit4]   2> NOTE: All tests run in this JVM: [TestBasicModelIne]
   [junit4] Completed [1/1 (1!)] in 3.79s, 1 test, 1 failure <<< FAILURES!
   [junit4] 
   [junit4] 
   [junit4] Tests with failures [seed: 86E85958B1183E93]:
   [junit4]   - 
org.apache.lucene.search.similarities.TestBasicModelIne.testRandomScoring
   [junit4] 
   [junit4] 
   [junit4] JVM J0: 0.40 .. 4.74 = 4.34s
   [junit4] Execution time total: 4.75 sec.
   [junit4] Tests summary: 1 suite, 1 test, 1 failure

{noformat}

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220375#comment-16220375
 ] 

Robert Muir commented on LUCENE-8015:
-

lets take a step back first. which 3 DFR components are involved? Can you 
include the test output?


> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8015) TestBasicModelIne.testRandomScoring failure

2017-10-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220198#comment-16220198
 ] 

Adrien Grand commented on LUCENE-8015:
--

I looked into it, this similarity ends up doing something like that:

{code}
double tfn = // non-decreasing function of tf
return (tfn * C1) * (C2 / (tfn + 1)); // C1 and C2 are some constants
{code}

The issue is that even if tfn increases, the result might decrease if {{tfn * 
C1}} is rounded down and/or {{C2/(tfn + 1)}} is rounded up. One way to fix it 
that I can think of is to make the value of tfn more discrete by doing eg.

{code}
diff --git 
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java 
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
index aacd246..554d12f 100644
--- 
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
+++ 
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
@@ -108,7 +108,7 @@ public class DFRSimilarity extends SimilarityBase {
 
   @Override
   protected double score(BasicStats stats, double freq, double docLen) {
-double tfn = normalization.tfn(stats, freq, docLen);
+double tfn = (float) normalization.tfn(stats, freq, docLen); // cast to 
float on purpose to introduce gaps between consecutive values and prevent 
double rounding errors to make the score decrease when tfn increases
 return stats.getBoost() *
 basicModel.score(stats, tfn) * afterEffect.score(stats, tfn);
   }

{code}

Opinions?

> TestBasicModelIne.testRandomScoring failure
> ---
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org