[ 
https://issues.apache.org/jira/browse/LUCENE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578656#action_12578656
 ] 

Eks Dev commented on LUCENE-1187:
---------------------------------

Michael, 
I do not think we need to add Factory (for this particular reason), DocIdSet 
type should not be assumed as we could come up with smart ways to select 
optimal Filter representation depending on doc-id distribution, size... 

The only problem we have with is that contrib classes, ChainedFilter and 
BooleanFilter assume BitSet. 
And the solution for this would be to add just a few methods to the DocIdSet 
that are able to do AND/OR/NOT on DocIdSet[] using DocIdSetIterator()
e.g. 
DocIdSet or(DocIdSet[], int minimumShouldMatch);
DocIdSet or(DocIdSet[]);


Optimized code for these basic operations *already exists*, can be copied from 
Conjunction/Disjunction/ReqOpt/ReqExcl Scorer classes by just simply 
stripping-off scoring part.

with these utility methods in DocIdSet, rewriting ChainedFilter/BooleanFilter 
to work with DocIdSet (and that works on all implementations of 
Fileter/DocIdSet) is 10 minutes job... than, if needed this implementation can 
be  optimized to cover type specific cases. Imo, BoolenFilter is better bet, we 
do not need both of them.  

Unfortunately I do not have time to play with it next 3-4 weeks, but should be 
no more than 2 days work (remember, we have difficult part already done in 
Scorers). Having so much code duplication is not something really good, but we 
can then later "merge" these somehow.


> Things to be done now that Filter is independent from BitSet
> ------------------------------------------------------------
>
>                 Key: LUCENE-1187
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1187
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: ChainedFilterAndCachingFilterTest.patch, 
> javadocsZero2Match.patch
>
>
> (Aside: where is the documentation on how to mark up text in jira comments?)
> The following things are left over after LUCENE-584 :
> For Lucene 3.0  Filter.bits() will have to be removed.
> There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the 
> boolean behaviour of a Filter.
> I have not looked into Filter caching yet, but I suppose there will be some 
> room for improvement there.
> Iirc the current core has moved to use OpenBitSetFilter and that is probably 
> what is being cached.
> In some cases it might be better to cache a SortedVIntList instead.
> Boolean logic on DocIdSetIterator is already available for Scorers (that 
> inherit from DocIdSetIterator) in the search package. This is currently 
> implemented by ConjunctionScorer, DisjunctionSumScorer,
> ReqOptSumScorer and ReqExclScorer.
> Boolean logic on BitSets is available in contrib/misc and contrib/queries
> DisjunctionSumScorer calls score() on its subscorers before the score value 
> actually needed.
> This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as 
> a superclass of DisjunctionSumScorer.
> To fully implement non scoring queries a TermDocIdSetIterator will be needed, 
> perhaps as a superclass of TermScorer.
> The javadocs in org.apache.lucene.search using matching vs non-zero score:
> I'll investigate this soon, and provide a patch when necessary.
> An early version of the patches of LUCENE-584 contained a class Matcher,
> that differs from the current DocIdSet in that Matcher has an explain() 
> method.
> It remains to be seen whether such a Matcher could be useful between
> DocIdSet and Scorer.
> The semantics of scorer.skipTo(scorer.doc()) was discussed briefly.
> This was also discussed at another issue recently, so perhaps it is wortwhile 
> to open a separate issue for this.
> Skipping on a SortedVIntList is done using linear search, this could be 
> improved by adding multilevel skiplist info much like in the Lucene index for 
> documents containing a term.
> One comment by me of 3 Dec 2008:
> A few complete (test) classes are deprecated, it might be good to add the 
> target release for removal there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to