[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518642
 ] 

Mark Harwood commented on LUCENE-584:
-------------------------------------

Some further thought on the roles/responsibilities of the various components:

Given a blank sheet of paper (a luxury we may not have) the minimum 
requirements I would have could be met with the following:
(note that use of the words "Matcher" and "Filter" etc have been removed 
because sets of doc IDs have applications outside of filtering/querying e.g. 
category counts)

interface DocIdSetFactory
{
    DocIdSet getDocIdSet(IndexReader reader)
}
This is more or less equivalent to the purpose of the existing "Filter" - 
different implementations define their own selection criteria and produce a set 
of matching doc Ids e.g. equivalent of RangeFilter. Each implementation must 
implement "hashcode" and "equals" methods based on it's criteria so the factory 
can be cached and reused (in the same way Query objects are expected to). The 
existing CachedFilterBuilder in the XMLQueryParser provides one example of a 
strategy for caching Filters using this facility. 


interface DocIdSet
{
    DocIdSetIterator getIterator();
}
This interface defines an immutable, threadsafe (and therefore cachable) 
collection of doc IDs. Different implementations provide space-efficient 
alternatives for sparse or heavily populated sets e.g. BitSet, OpenBitSet, 
SortedVIntList. As an example caching strategy - the existing 
CachingWrapperFilter would cache these objects in a WeakHashMap keyed on 
IndexReader.

interface DocIdSetIterator
{
    boolean next();
    int getDoc();
   ....etc
}
A thread unsafe, single use object, (probably with only one implementation) 
that is used to iterate across any DocIdSet. Not cachable and used by Scorers.

In the existing proposal it feels like DocIdSet and DocIdSetIterator are rolled 
into one in the form of the Matcher which complicates/prevents caching 
strategies.

Cheers
Mark




> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, 
> Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
> Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
> Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
> Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to