On Tue, Dec 18, 2012 at 4:46 AM, Shai Erera <[email protected]> wrote:
> Are you sure that all Codecs return 1 if you indexed with DOCS_ONLY? Do we
> have a test that can trip bad Codecs?
I'm not sure! We should make a test & fix any failing ones ...
> It may be more than just changing the documentation...
Right.
> Why would e.g. TermQuery need to write specialized code for these cases? I
> looked at TermScorer, and its freq() just returns docsEnum.freq().
I meant if we did not adopt this spec ("freq() will lie and return 1
when the field was indexed as DOCS_ONLY"), then e.g. TermQuery would
need specialized code.
> I think that Similarity may be affected? Which brings the question - how do
> Similarity impls know what flags the DE was opened with, and shouldn't they
> be specialized?
> E.g. TFIDFSimilarity.ExactTFIDFDocScorer uses the freq passed to score() as
> an index to an array, so clearly it assumes it is >= 0 and also <
> scoreCache.length.
> So I wonder what will happen to it when someone's Codec will return a
> negative value or MAX_INT in case frequencies aren't needed?
Well, if you passed FLAGS_NONE when you opened the DE then it's your
responsibility to never call freq() ... ie, don't call freq() and pass
that to the sim.
> I do realize that you shouldn't call Similarity with missing information,
> and TermWeight obtains a DocsEnum with frequencies, so in that regard it is
> safe.
> And if you do obtain a DocsEnum with FLAG_NONE, you'd better know what
> you're doing and don't pass a random freq() to Similarity.
Right.
> I lean towards documenting the spec from above, and ensuring that all Codecs
> return 1 for DOCS_ONLY.
+1
So freq() is undefined if you had passed FLAGS_NONE, and we will lie
and say freq=1 (need a test verifying this) if the field was indexed
as DOCS_ONLY.
> If in the future we'll need to handle the case where someone receives a
> DocsEnum which it needs to consume, and doesn't know which flags were used
> to open it, we can always add a getFlags to DE.
Yeah ...
Mike McCandless
http://blog.mikemccandless.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]