[
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-5122:
---------------------------
Description:
As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy,
and this (aparently) led to a failure in testEstimatedHitCounts.
As far as i can tell: the test assumes that specific values would be returned
as the _estimated_ "hits" for a colleation, and it appears that the change in
MergePolicy however resulted in different segments with different term stats,
causing the estimation code to produce different values then what is expected.
I made a quick attempt to improve the test to:
* expect explicit exact values only when spellcheck.collateMaxCollectDocs is
set such that the "estimate' should actually be exact (ie:
collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs
in the index
* randomize the values used for collateMaxCollectDocs and confirm that the
estimates are never more then the num docs in the index
This lead to an odd "ArithmeticException: / by zero" error in the test, which
seems to suggest that there is a genuine bug in the code for estimating the
hits that only gets tickled in certain
mergepolicy/segment/collateMaxCollectDocs combinations.
*Update:* This appears to be a general problem with collecting docs out of
order and the estimation of hits -- i believe even if there is no divide by
zero error, the estimates are largely meaningless since the docs are collected
out of order.
was:
As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy,
and this (aparently) led to a failure in testEstimatedHitCounts.
As far as i can tell: the test assumes that specific values would be returned
as the _estimated_ "hits" for a colleation, and it appears that the change in
MergePolicy however resulted in different segments with different term stats,
causing the estimation code to produce different values then what is expected.
I made a quick attempt to improve the test to:
* expect explicit exact values only when spellcheck.collateMaxCollectDocs is
set such that the "estimate' should actually be exact (ie:
collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs
in the index
* randomize the values used for collateMaxCollectDocs and confirm that the
estimates are never more then the num docs in the index
This lead to an odd "ArithmeticException: / by zero" error in the test, which
seems to suggest that there is a genuine bug in the code that only gets tickled
in certain mergepolicy/segment/collateMaxCollectDocs combinations.
Summary: spellcheck.collateMaxCollectDocs estimates seem to be
meaninless -- can lead to "ArithmeticException: / by zero" (was:
"ArithmeticException: / by zero" using spellcheck.collateMaxCollectDocs)
FYI: I attempted ot do a simple revert of r1479645 and the test still fails --
but reviewing hte diff i think that's because there doesn't seem to be anything
paying attention to the FORCE_INORDER_COLLECTION flag at collection time, so
it's effectively useless.
I'm at a loss to really understand what the correct fix should be at this point
> spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead
> to "ArithmeticException: / by zero"
> ----------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-5122
> URL: https://issues.apache.org/jira/browse/SOLR-5122
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.4
> Reporter: Hoss Man
> Attachments: SOLR-5122.patch
>
>
> As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy,
> and this (aparently) led to a failure in testEstimatedHitCounts.
> As far as i can tell: the test assumes that specific values would be returned
> as the _estimated_ "hits" for a colleation, and it appears that the change in
> MergePolicy however resulted in different segments with different term stats,
> causing the estimation code to produce different values then what is expected.
> I made a quick attempt to improve the test to:
> * expect explicit exact values only when spellcheck.collateMaxCollectDocs is
> set such that the "estimate' should actually be exact (ie:
> collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num
> docs in the index
> * randomize the values used for collateMaxCollectDocs and confirm that the
> estimates are never more then the num docs in the index
> This lead to an odd "ArithmeticException: / by zero" error in the test, which
> seems to suggest that there is a genuine bug in the code for estimating the
> hits that only gets tickled in certain
> mergepolicy/segment/collateMaxCollectDocs combinations.
> *Update:* This appears to be a general problem with collecting docs out of
> order and the estimation of hits -- i believe even if there is no divide by
> zero error, the estimates are largely meaningless since the docs are
> collected out of order.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]