[
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518845
]
Mark Harwood commented on LUCENE-584:
-------------------------------------
Hi Paul,
Not sure we've reached a common understanding here yet.
You said "That was a mistake. BitSetMatcher is a Matcher constructed from a
BitSet, and SortedVIntList has a getMatcher() method, and I confused the two. "
Ok, thanks for the clarification. I still feel uncomfortable because the method
getMatcher() is not abstracted to a common interface. This was the thinking
behind my "getIterator" method on DocIdSet.
I too made a mistake in my earlier comments. DocIdSetIterator does NOT have
"probably one implementation". There would be an implementation for each
different type of DocIdSet (Bitset/OpenBitSet/VIntList).
You said "some Filters do not need a cache. For example: TermFilter". I'm not
sure why that has been singled out as not worthy of caching. I have certain
terms (e.g. gender:male) where the TermDocs is very large (50% of all docs in
the index!) so multiple calls to TermDocs for term "gender:male" (if that is
what you are suggesting) is highly undesirable. These are typically handled in
the XMLQueryParser using syntax like this:
<CachedFilter>
<TermsFilter fieldName="gender">male</TermsFilter>
</CachedFilter>
You said: "CachingWrapperFilter could then become a cache for BitSetFilter. "
This means that the only caching strategy is one based on bitsets - does this
not lose perhaps the main benefit of your whole proposal? - the ability to have
alternative space efficient storage of sets of document ids e.g. SortedVIntList.
If this is undesirable (my guess is "yes") then the proposal in my previous
comment is a solution which allows for caching of any/all types of the new sets
(openBitSet,BitSet,SortedVIntList etc) Regardless of my choice of class names
or decisions over interfaces vs abstract classes do you not at least agree the
need for 3 types of functionality:
1) A factory for instantiating sets of document ids matching a particular set
of criteria (which can be costly to call). While the factory is not expected to
implement a caching strategy it is expected to implement hashcode/equals
simply to aid any caching services which would need this help to identify
previously instantiated sets which share the same criteria as ant new requests
(This service I identified as my "DocIdSetFactory" and TermsFilter/RangeFilter
would be example implementations).
2) An object representing an instantiated set of document ids which can be
cached and can create iterators for use in seperate threads (identified as my
DocIdSet - example implementations being called something like BitSetDocSet,
SortedVIntList)
3) An iterator for a set of document ids (my DocIdSetIterator - example impls
being called something like BitSetDocSetIterator SortedVIntListIterator)
Each type of functionality can have different implementations so the
functionality must be defined using an interface or abstract class.
If we can agree this much as a set of responsibilities then we can begin to map
these services onto something more concrete.
Cheers
Mark
> Decouple Filter from BitSet
> ---------------------------
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.0.1
> Reporter: Peter Schäfer
> Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt,
> Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch,
> Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch,
> Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch,
> Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable
> {
> public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet
> {
> public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of
> memory. It would be desirable to have an alternative BitSet implementation
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation
> could still delegate to =java.util.BitSet=.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]