[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913659#action_12913659 ]
Uwe Schindler commented on LUCENE-2649: --------------------------------------- I am also strongly +1 for the additional Bits interface (as Ryan did, it does not always need to be a real OpenBitSet, so when no deletions and all things set, we can use a dummy one). I had often use cases where i needed the information, if this document really has a value set or not, and i don't use Solr so much. {quote} And being able to distinguish missing values, eg to sort them last, or to do something else, is useful. Once we do this we should also [eventually] move "sort missing last" capability into Lucene's comparators. {quote} +1 {quote} I think this is the right approach - expecting FC's valid bits to take deletions into account is too much. We have IR.getDeletedDocs for this. {quote} We don't need to AND them together, maybe simply wrap the OpenBitset by a custom Bits impl, that ands in the getter? But as deletions are separated in IndexReader and the cache can reuse the cache even when new deletions are added, i think keeping it separate is fine. About the whole bit set: Do we really need to couple the Bits interface to the type? Because if you exchange the parser/native type (e.g. parse ints as byte), the valid docs are still the same, only the native type representation is different. So how about we add a getBits(field) method to FieldCache that returns the valid docs. If field was not yet retrieved as a native type it could throw IllegalStateEx, else it would return the Bits interface (globally, but per field, but not per parser/datatype) created during the last FC polulation run? We have then also the possibility to disable the default generation of Bits and do it lazily (which should run faster, as it does not need to parse the values, only enumerate terms and termdocs). {quote} Really, "in general" we need a better way for the query execution path to enforce deleted docs. Eg if the FCRF will be AND'd w/ a query that's already excluding del docs then it need not be careful about deletions... {quote} Thats another thing, but maybe we remove deleted docs completely from query processing and simply apply it like a filter before the collector. Not sure about the implications and performance. > FieldCache should include a BitSet for matching docs > ---------------------------------------------------- > > Key: LUCENE-2649 > URL: https://issues.apache.org/jira/browse/LUCENE-2649 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Ryan McKinley > Fix For: 4.0 > > Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, > LUCENE-2649-FieldCacheWithBitSet.patch, > LUCENE-2649-FieldCacheWithBitSet.patch, > LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch > > > The FieldCache returns an array representing the values for each doc. > However there is no way to know if the doc actually has a value. > This should be changed to return an object representing the values *and* a > BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org