[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

Uwe Schindler (JIRA) Wed, 22 Sep 2010 09:49:58 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913659#action_12913659
 ]


Uwe Schindler commented on LUCENE-2649:
---------------------------------------

I am also strongly +1 for the additional Bits interface (as Ryan did, it does 
not always need to be a real OpenBitSet, so when no deletions and all things 
set, we can use a dummy one).
I had often use cases where i needed the information, if this document really 
has a value set or not, and i don't use Solr so much.

{quote}
And being able to distinguish missing values, eg to sort them last, or
to do something else, is useful. Once we do this we should also
[eventually] move "sort missing last" capability into Lucene's
comparators.
{quote}

+1

{quote}
I think this is the right approach - expecting FC's valid bits to
take deletions into account is too much. We have IR.getDeletedDocs
for this.
{quote}

We don't need to AND them together, maybe simply wrap the OpenBitset by a 
custom Bits impl, that ands in the getter? But as deletions are separated in 
IndexReader and the cache can reuse the cache even when new deletions are 
added, i think keeping it separate is fine.

About the whole bit set: Do we really need to couple the Bits interface to the 
type? Because if you exchange the parser/native type (e.g. parse ints as byte), 
the valid docs are still the same, only the native type representation is 
different. So how about we add a getBits(field) method to FieldCache that 
returns the valid docs. If field was not yet retrieved as a native type it 
could throw IllegalStateEx, else it would return the Bits interface (globally, 
but per field, but not per parser/datatype) created during the last FC 
polulation run? We have then also the possibility to disable the default 
generation of Bits and do it lazily (which should run faster, as it does not 
need to parse the values, only enumerate terms and termdocs).

{quote}
Really, "in general" we need a better way for the query execution path
to enforce deleted docs. Eg if the FCRF will be AND'd w/ a query
that's already excluding del docs then it need not be careful about
deletions...
{quote}

Thats another thing, but maybe we remove deleted docs completely from query 
processing and simply apply it like a filter before the collector. Not sure 
about the implications and performance.

> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
>                 Key: LUCENE-2649
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2649
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a 
> BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

Reply via email to