[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515434
 ] 

Paul Elschot commented on LUCENE-584:
-------------------------------------

Have a look at BitSetMatcher in the -default patch. It is constructed from a 
BitSet, and it has a method getMatcher() that returns a Matcher that acts as a 
searching iterator over the BitSet.

So that is 1) to 4), at least potentially. A clone() method is currently not 
implemented iirc, but each call to getMatcher() will return a new iterator over 
the underlying BitSet. And when guaranteed non modifyability is needed, a 
constructor can take a copy of the given document set, in whatever form.

The point of Matcher is that it allows other implementations than BitSet, like 
OpenBitSet and SortedVIntList. Both have the properties that you are looking 
for. SortedVIntList can
save a lot of memory when compared to (Open)BitSet, and OpenBitSet is somewhat 
faster than BitSet. 

I'd like to have a skip list version of SortedVIntList, too. This would be 
slightly larger than SortedVIntList, but more efficient on skipTo().

But the first thing that is necessary is to have Filter independent from BitSet.

The real pain with that is going to be the code that currently implements 
Filters
outside the lucene code base, and a default implementation of a Matcher
should be of help there, just as it is in the -core patch now.

The default implementation will probably need to be improved from its current
state, but that can be done later. For example, one could also use OpenBitSet
in all cases, and even collect the filtered documents directly in that.

Cheers,
Paul Elschot

> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, 
> Matcher-core20070725.patch, Matcher-default20070725.patch, 
> Matcher-ground20070725.patch, Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to