On Thursday 26 January 2006 20:08, Chris Hostetter wrote:
>
> The subject of revamping the Filter API to support more compact filter
> representations has come up in the past ... At least one patch comes to
> mind that helps with the issue...
>
> https://issues.apache.org/jira/browse/LUCENE-328
>
> ...i'm not intimitely familiar with that code, but if i recall correctly
> from the last time i read it, it doesn't propose any actual API changes
> just some utilities to reduce memory usage.
>
> Reading your post has me thinking about this whole issue again,
> particularly the subject of Filters that are straight forward enough they
> could be implimented as simple iterators with very little state and what
> API changes could be made to support the interface you describe and still
> be backwards compatible.
>
> One thing that comes to mind (that i don't remember suggesting before, but
> perhaps someone else has suggested it before) is that since Filter is an
> bastract class which people arecurrently required to subclass, we could
> follow a migration path something like this...
>
> 1) add a SearchFilter interface like the one you describe to the core
> code base
> 2) add the following method declaration to the Filter class...
> public SearchFilter getSearchFilter(IndexReader) throws IOException
> ...impliment this method by calling bits, and returning an instance
> of a thin inner class that wraps the BitSet
This is done in the FilteredQuery referred to above in the above reference.
The wrapper might take a small performance hit.
> 3) indicate that Filter.bits() is deprecated.
> 4) change all existing calls to Filter.bits() in the core lucene code
> base to call Filter.getSearchFilter and do whatever iterating is
> neccessary.
> 5) gradually reimpliment all of the concrete instances of Filter in
> the core lucene code base so they override the getSearchFilter method
> with something that returns a more "iterator" style SearchFilter,
> and impliment their bits() method to use the SearchFilter to build up
> the bit set if clients call it directly.
> 6) wait a suitable amount of time.
> 7) remove Filter.bits() and all of the concrete implimentations from the
> lucene core.
Sounds feasible to me, provided the performance hit is small enough.
> ...i think that would be a fairly straight forward and practical way to
> execute such a change. The big question in my mind is what the
> "SearchFilter" interface should look like. what you propose is along the
> usage lines of "iterate over your ScoreDocs, and foreach one test it
> against hte filter" ... but i'm not convinced that it wouldnt' make more
> sense to say "ask the filter what the next viable doc is, now score it",
> ala...
>
> public interface SearchFilter {
> /** returns doc ids that pass the filter, in increasing order.
> * returns 0 once there are no more docs.
> */
> int doc getNextFilteredDoc();
> }
>
>
> thoughts?
For search speed one needs to know the next filtered document, much
like BitSet.nextSetBit(). See DocNrSkipper in the issue referred to above.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]