[
https://issues.apache.org/jira/browse/LUCENE-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084104#comment-13084104
]
Robert Muir commented on LUCENE-3357:
-------------------------------------
{quote}
freq: I didn't know about that! Still, I want to provide not "plausible", but
at least "safe" statistics in this case. You didn't touch docFreq and
numberOfDocuments, so I assumed at least these two are filled with the actual
values, is that so?
{quote}
But I don't think we should populate it with arbitrary ones, I like 1 because
this is consistent with what you asked for if you omit term frequency (I think
its confusing to put something other than 1 here, its inconsistent with how
omitTF works for lucene's default scoring).
right, docFreq is always populated. but if you omitTF, freq will be 1 (for
exact scorers) or <= 1 (for sloppy scorers) as no frequency is available.
I ran a quick test and got decreases in MAP (probably slight, maybe not even
significant) with PL2 and dirichlet with the changes. I figure we can first fix
D and then move on to P and such, save LM for last as its a major pain :)
> Unit and integration test cases for the new Similarities
> --------------------------------------------------------
>
> Key: LUCENE-3357
> URL: https://issues.apache.org/jira/browse/LUCENE-3357
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: core/query/scoring
> Affects Versions: flexscoring branch
> Reporter: David Mark Nemeskey
> Assignee: David Mark Nemeskey
> Priority: Minor
> Labels: gsoc, gsoc2011, test
> Fix For: flexscoring branch
>
> Attachments: LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch,
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch,
> LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch, LUCENE-3357.patch,
> LUCENE-3357.patch
>
>
> Write test cases to test the new Similarities added in
> [LUCENE-3220|https://issues.apache.org/jira/browse/LUCENE-3220]. Two types of
> test cases will be created:
> * unit tests, in which mock statistics are provided to the Similarities and
> the score is validated against hand calculations;
> * integration tests, in which a small collection is indexed and then
> searched using the Similarities.
> Performance tests will be performed in a separate issue.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]