[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518825
 ] 

Paul Elschot commented on LUCENE-584:
-------------------------------------

Mark,

I said: "there is never a threadsafety problem. (See BitSetMatcher.getMatcher() 
which uses a local class for the resulting Matcher.)"
That was a mistake. BitSetMatcher is a Matcher constructed from a BitSet, and 
SortedVIntList has a getMatcher() method, and I confused the two.

A Matcher is intended to be used in a single thread, so I don't expect thread 
safety problems.

The problem for the XML parser is that with this patch, the implementing data 
structure of a Filter becomes
unaccessible from the Filter class, so it cannot be cached from there.
That means that some cached data structure will have to be chosen, and one way 
to do
that is by using class BitSetFilter from the patch. This has a bits() method 
just like the current Filter class.
CachingWrapperFilter could then become a cache for BitSetFilter.

There is indeed no caching of filters in this patch.
The reason for that is that some Filters do not need a cache. For example:
class TermFilter {
  TermFilter(Term t) {this.term = t;}
  Matcher getMatcher(reader) {return new TermMatcher( 
reader.termDocs(this.term);}
}
TermMatcher does not exist (yet), but it could be easily introduced by leaving 
all the
scoring out of the current TermScorer.

As for DocIdSet, as long as this provides a Matcher as an iterator, it can be 
used to
implement a (caching) filter.

I don't think this patch complicates the implementation of caching strategies.
For example one could define:
class CachableFilter extends Filter {
  ... some methods to access the underlying data structure to be cached. ...
}
or write a similar adapter for some subclass of Filter and then write a 
FilterCache that caches these.

I did consider defining Matcher as an interface, but I preferred not to do that 
because
of the default explain() method in the Matcher class of the patch.


> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt, 
> Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
> Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
> Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
> Some Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to