[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Paul Elschot (JIRA) Thu, 20 Nov 2008 00:07:07 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649298#action_12649298
 ]


Paul Elschot commented on LUCENE-1461:
--------------------------------------

For fields that have no more distinct values than fit into a short (2^16 at 
best, 65536), using a short[] would make sense I think. As the number of 
distinct field values can simply be counted in this context, it would make 
sense to simply replace the int[] by a short[] in that case. But it would only 
help to reduce space, and only a factor two.

For a set based query, the problem boils down to doing integer set membership 
in the iterator. For small sets, binary search should be fine. For larger ones 
an OpenBitSet would be preferable, but in this context that would only be 
feasible when the number of different terms is a lot smaller than the number of 
documents in the index.

For location grid-blocks one needs to deal with more than one dimension. In 
such cases my first thought is to use indexed hierarchical prefixes in each 
dimension, because this allows skipTo() to be used on the documents for the 
intersection between the dimensions. (But there may be better ways, it's a long 
time ago that I had a look at the literature for this.)
Do you need to index separate lower bounds and upper bounds on the data? That 
would complicate things.
Without indexed bounds (i.e. point data only) for each dimension it could make 
sense to use this multi range filter.



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to