[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to "ArithmeticException: / by zero"

Hoss Man (JIRA) Tue, 13 Aug 2013 08:52:33 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738395#comment-13738395
 ]


Hoss Man commented on SOLR-5122:
--------------------------------

The initial jenkins failure i saw was "At revision 1511278"...

https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/343/
https://mail-archives.apache.org/mod_mbox/lucene-dev/201308.mbox/%3Calpine.DEB.2.02.1308070919170.13959@frisbee%3E

{quote}
I can reproduce this -- it's probably related to the MP randomization i 
put in ... looks like it's doing exact numeric comparisons based on term 
stats.  I'll take a look later today...

ant test  -Dtestcase=SpellCheckCollatorTest 
-Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true   -Dtests.locale=nl 
-Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII
{quote}

...regardless of he initial failure though, if you try out the patch i attached 
to try and improve the test coverage, then the "reproduce" line from the 
failure i posted along iwth that patch still reproduces on trunk (but you do 
have to manually uncomment the {{@Ignore}}...

{code}
ant test  -Dtestcase=SpellCheckCollatorTest 
-Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=nl 
-Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII
{code}
                
> spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
> to "ArithmeticException: / by zero"
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5122
>                 URL: https://issues.apache.org/jira/browse/SOLR-5122
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Hoss Man
>            Assignee: James Dyer
>         Attachments: SOLR-5122.patch
>
>
> As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
> and this (aparently) led to a failure in testEstimatedHitCounts.
> As far as i can tell: the test assumes that specific values would be returned 
> as the _estimated_ "hits" for a colleation, and it appears that the change in 
> MergePolicy however resulted in different segments with different term stats, 
> causing the estimation code to produce different values then what is expected.
> I made a quick attempt to improve the test to:
>  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
> set such that the "estimate' should actually be exact (ie: 
> collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
> docs in the index
>  * randomize the values used for collateMaxCollectDocs and confirm that the 
> estimates are never more then the num docs in the index
> This lead to an odd "ArithmeticException: / by zero" error in the test, which 
> seems to suggest that there is a genuine bug in the code for estimating the 
> hits that only gets tickled in certain 
> mergepolicy/segment/collateMaxCollectDocs combinations.
> *Update:* This appears to be a general problem with collecting docs out of 
> order and the estimation of hits -- i believe even if there is no divide by 
> zero error, the estimates are largely meaningless since the docs are 
> collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to "ArithmeticException: / by zero"

Reply via email to