[
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913646#action_12913646
]
Michael McCandless commented on LUCENE-2649:
--------------------------------------------
bq. I think this is a better option then adding a parameter to Parser since we
can have an easy upgrade path. Parser is an interface, so we can not just add
to it without breaking compatibility. To change things in 4.x, 3.x should have
an upgrade path.
Hmm... I'd rather make an exception to 3.x, ie, allow the addition of
this method to the interface, than confuse the 4.x API, going forward,
with 2 classes?
Creating a custom FieldCache parser is an extremely advanced use
case... very few users do this, and those that do will grok this
method?
bq. However, I don't cache the Bits separately since this is an edge case that
should be avoided, but at least does not fail if you are not consistent.
This makes me nervous since it can now lead to further cases of field
cache insanity, ie, you loaded it once w/o the valid bits, and again
w/ the valid bits, and now your values array is taking up 2X the RAM.
It's already bad enough that FC allows one kind of insanity :)
bq. This does cache a MatchAllBits even when 'cacheValidBits' is false, since
that is small (a small class with one int)
Hmm... but if I pass false here, it shouldn't spend any time
allocating the bit set, building it, checking the bit set for "all
bits set", etc.?
{quote}
bq. * We don't have to @Deprecate for 4.0 - just remove it, and note this
in MIGRATE.txt. (Though for 3.x we need the deprecation, so maybe do 3.x patch
first, then remove deprecations for 4.0?).
My plan was to apply with deprecations to 4.x, then merge with 3.x. Then
replace the calls in 4.x, then remove the old functions. Does this sound
reasonable?
{quote}
OK that sounds like a good plan!
bq. Right, the ValidBits are only checked for docs that exists (and the FC
values are only set for docs that exists -- this has not changed), and may
contain false positives for deleted docs. I think this is OK since most use
cases (i can think of) deal with deletions anyway. Any ideas how/if we should
change this?
I think this is the right approach -- expecting FC's valid bits to
take deletions into account is too much. We have IR.getDeletedDocs
for this.
But, eg this means classes like FCRF will still have to consult
deleted docs.
Really, "in general" we need a better way for the query execution path
to enforce deleted docs. Eg if the FCRF will be AND'd w/ a query
that's already excluding del docs then it need not be careful about
deletions...
bq. (I did not realize that the FC is reused after deletions -- so clever)
Ha! There was a time when it didn't ;)
> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
> Key: LUCENE-2649
> URL: https://issues.apache.org/jira/browse/LUCENE-2649
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: LUCENE-2649-FieldCacheWithBitSet.patch,
> LUCENE-2649-FieldCacheWithBitSet.patch,
> LUCENE-2649-FieldCacheWithBitSet.patch,
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a
> BitSet for all valid docs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]