[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518845
 ] 

Mark Harwood commented on LUCENE-584:
-------------------------------------

Hi Paul,

Not sure we've reached a common understanding here yet.

You said "That was a mistake. BitSetMatcher is a Matcher constructed from a 
BitSet, and SortedVIntList has a getMatcher() method, and I confused the two. "
Ok, thanks for the clarification. I still feel uncomfortable because the method 
getMatcher() is not abstracted to a common interface. This was the thinking 
behind my "getIterator" method on DocIdSet.

I too made a mistake in my earlier comments. DocIdSetIterator does NOT have 
"probably one implementation". There would be an implementation for each 
different type of DocIdSet (Bitset/OpenBitSet/VIntList).

You said "some Filters do not need a cache. For example: TermFilter".  I'm not 
sure why that has been singled out as not worthy of caching. I have certain 
terms (e.g. gender:male) where the TermDocs is very large (50% of all docs in 
the index!) so multiple calls to TermDocs for term "gender:male" (if that is 
what you are suggesting) is highly undesirable. These are typically handled in 
the XMLQueryParser using syntax like this:
  <CachedFilter>
        <TermsFilter fieldName="gender">male</TermsFilter>
  </CachedFilter>

You said: "CachingWrapperFilter could then become a cache for BitSetFilter. "
This means that the only caching strategy is one based on bitsets - does this 
not lose perhaps the main benefit of your whole proposal? - the ability to have 
alternative space efficient storage of sets of document ids e.g. SortedVIntList.

If this is undesirable (my guess is "yes") then the proposal in my previous 
comment is a solution which allows for caching of any/all types of the new sets 
(openBitSet,BitSet,SortedVIntList etc) Regardless of my choice of class names 
or decisions over interfaces vs abstract classes do you not at least agree the 
need for 3 types of functionality:

1) A factory for instantiating sets of document ids matching a particular set 
of criteria (which can be costly to call). While the factory is not expected to 
implement a caching  strategy it is expected to implement hashcode/equals 
simply to aid any caching services which would need this help to identify 
previously instantiated sets which share the same criteria as ant new requests 
(This service I identified as my "DocIdSetFactory" and TermsFilter/RangeFilter 
would be example implementations). 
2) An object representing an instantiated set of document ids which can be 
cached and can create iterators for use in seperate threads (identified as my 
DocIdSet -  example implementations being called something like BitSetDocSet, 
SortedVIntList) 
3) An iterator for a set of document ids (my DocIdSetIterator - example impls 
being called something like BitSetDocSetIterator SortedVIntListIterator)

Each type of functionality can have different implementations so the 
functionality must be defined using an interface or abstract class. 
If we can agree this much as a set of responsibilities then we can begin to map 
these services onto something more concrete.


Cheers
Mark






> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, 
> Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
> Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
> Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
> Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to