[ 
https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129957#comment-14129957
 ] 

Adrien Grand commented on LUCENE-5938:
--------------------------------------

bq. Do you need to store words with all bits set?

That is a good question. For example WAH8DocIdSet detects dense sections of the 
set and has the ability to temporarily invert the encoding in such cases in 
order to save space (this is why you see memory usage for WAH8 decrease on 
dense bitsets on the benchmark). I like the idea but it doesn't look 
straightforward to me, so let's maybe pull this in and then work on improving 
efficiency on dense sets?

bq. Perhaps we can just replace all of that with this bitset?

Sounds good. I'll update the patch to use this new doc id set in relevant 
queries/filters.

> New DocIdSet implementation with random write access
> ----------------------------------------------------
>
>                 Key: LUCENE-5938
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5938
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-5938.patch
>
>
> We have a great cost API that is supposed to help make decisions about how to 
> best execute queries. However, due to the fact that several of our filter 
> implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, 
> either we use the cost API and make bad decisions, or need to fall back to 
> heuristics which are not as good such as 
> RandomAccessFilterStrategy.useRandomAccess which decides that random access 
> should be used if the first doc in the set is less than 100.
> On the other hand, we also have some nice compressed and cacheable DocIdSet 
> implementation but we cannot make use of them because TermsFilter requires a 
> DocIdSet that has random write access, and FixedBitSet is the only DocIdSet 
> that we have that supports random access.
> I think it would be nice to replace FixedBitSet in those filters with another 
> DocIdSet that would also support random write access but would have a better 
> cost?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to