It looks like Lucene does not use any of the BitSet boolean logic operators ( and , or etc) - it just seems to use the "get" method to test set membership for individual docs.
If this is true the DocIdSet would look like this:
public interface DocIdSet
{
public abstract boolean contains(int docId);
}
And Filter would become:
public interface Filter
{
public abstract DocIdSet getDocIdSet(IndexReader reader) throws IOException;
}


As you suggest, the DocIdSet would be cached and the policy for evicting DocIdSets from cache would have to balance these factors for each DocIdSet:
1) Cache "Hit rate" on the set
2) Cost of recreating the set (ie computational cost/ disk access)
3) Memory used by set


We can compute #1 easily enough, #2 may prove hard to quantify but we could ensure we have #3 by insisting that the DocIdSet include this method:
public abstract int getCachedSizeInBytes();
We could also consider the option of allowing DocIdSets to implement "Serializable" in which case the cache manager would be able to serialize DocIdSets to temporary storage.


I'm not sure how you would want to handle the versioning issues around a change to the Filter interface though.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to