[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518825 ]
Paul Elschot commented on LUCENE-584: ------------------------------------- Mark, I said: "there is never a threadsafety problem. (See BitSetMatcher.getMatcher() which uses a local class for the resulting Matcher.)" That was a mistake. BitSetMatcher is a Matcher constructed from a BitSet, and SortedVIntList has a getMatcher() method, and I confused the two. A Matcher is intended to be used in a single thread, so I don't expect thread safety problems. The problem for the XML parser is that with this patch, the implementing data structure of a Filter becomes unaccessible from the Filter class, so it cannot be cached from there. That means that some cached data structure will have to be chosen, and one way to do that is by using class BitSetFilter from the patch. This has a bits() method just like the current Filter class. CachingWrapperFilter could then become a cache for BitSetFilter. There is indeed no caching of filters in this patch. The reason for that is that some Filters do not need a cache. For example: class TermFilter { TermFilter(Term t) {this.term = t;} Matcher getMatcher(reader) {return new TermMatcher( reader.termDocs(this.term);} } TermMatcher does not exist (yet), but it could be easily introduced by leaving all the scoring out of the current TermScorer. As for DocIdSet, as long as this provides a Matcher as an iterator, it can be used to implement a (caching) filter. I don't think this patch complicates the implementation of caching strategies. For example one could define: class CachableFilter extends Filter { ... some methods to access the underlying data structure to be cached. ... } or write a similar adapter for some subclass of Filter and then write a FilterCache that caches these. I did consider defining Matcher as an interface, but I preferred not to do that because of the default explain() method in the Matcher class of the patch. > Decouple Filter from BitSet > --------------------------- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.0.1 > Reporter: Peter Schäfer > Priority: Minor > Attachments: bench-diff.txt, bench-diff.txt, > Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, > Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, > Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, > Some Matchers.zip > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]