[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518845 ]
Mark Harwood commented on LUCENE-584: ------------------------------------- Hi Paul, Not sure we've reached a common understanding here yet. You said "That was a mistake. BitSetMatcher is a Matcher constructed from a BitSet, and SortedVIntList has a getMatcher() method, and I confused the two. " Ok, thanks for the clarification. I still feel uncomfortable because the method getMatcher() is not abstracted to a common interface. This was the thinking behind my "getIterator" method on DocIdSet. I too made a mistake in my earlier comments. DocIdSetIterator does NOT have "probably one implementation". There would be an implementation for each different type of DocIdSet (Bitset/OpenBitSet/VIntList). You said "some Filters do not need a cache. For example: TermFilter". I'm not sure why that has been singled out as not worthy of caching. I have certain terms (e.g. gender:male) where the TermDocs is very large (50% of all docs in the index!) so multiple calls to TermDocs for term "gender:male" (if that is what you are suggesting) is highly undesirable. These are typically handled in the XMLQueryParser using syntax like this: <CachedFilter> <TermsFilter fieldName="gender">male</TermsFilter> </CachedFilter> You said: "CachingWrapperFilter could then become a cache for BitSetFilter. " This means that the only caching strategy is one based on bitsets - does this not lose perhaps the main benefit of your whole proposal? - the ability to have alternative space efficient storage of sets of document ids e.g. SortedVIntList. If this is undesirable (my guess is "yes") then the proposal in my previous comment is a solution which allows for caching of any/all types of the new sets (openBitSet,BitSet,SortedVIntList etc) Regardless of my choice of class names or decisions over interfaces vs abstract classes do you not at least agree the need for 3 types of functionality: 1) A factory for instantiating sets of document ids matching a particular set of criteria (which can be costly to call). While the factory is not expected to implement a caching strategy it is expected to implement hashcode/equals simply to aid any caching services which would need this help to identify previously instantiated sets which share the same criteria as ant new requests (This service I identified as my "DocIdSetFactory" and TermsFilter/RangeFilter would be example implementations). 2) An object representing an instantiated set of document ids which can be cached and can create iterators for use in seperate threads (identified as my DocIdSet - example implementations being called something like BitSetDocSet, SortedVIntList) 3) An iterator for a set of document ids (my DocIdSetIterator - example impls being called something like BitSetDocSetIterator SortedVIntListIterator) Each type of functionality can have different implementations so the functionality must be defined using an interface or abstract class. If we can agree this much as a set of responsibilities then we can begin to map these services onto something more concrete. Cheers Mark > Decouple Filter from BitSet > --------------------------- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.0.1 > Reporter: Peter Schäfer > Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]