I would love to see this find it's way into Lucene!

It puts a logarithmic bound (I think?) on the number of terms whose
docs must be visited to implement the range query, at the expense
of an increase in index size for those fields that need range search.

For large indexes this should make an incredible difference on
RangeQuery performance.

Mike

Uwe Schindler wrote:

I do not want to attach this to the issue report, just for completeness:

I implemented (based on RangeFilter) another approach for faster
RangeQueries, based on longs stored in index in a special format.

The idea behind this is to store the longs in different precision in index and partition the query range in such a way, that the outer boundaries are search using terms from the highest precision, but the center of the search
Range with lower precision. The implementation stores the longs in 8
different precisions (using a class called TrieUtils). It also has support for Doubles, using the IEEE 754 floating-point "double format" bit layout with some bit mappings to make them binary sortable. The approach is used in
rather big indexes, query times are even on low performance desktop
computers <<100 ms (!) for very big ranges on indexes with 500000 docs.

I called this RangeQuery variant and format "TrieRangeRange" query because the idea looks like the well-known Trie structures (but it is not identical
to real tries, but algorithms are related to it).

I was thinking about supplying this to Lucene contrib, any thoughts about
it?

More infos from the JavaDoc of the project:
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/search/TrieRangeQuer
y.html
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/utils/TrieUtils.html


Source:
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/search/TrieRangeQuery.java?view=markup
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/utils/TrieUtils.java?view=markup


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]

From: Michael McCandless (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 12:14 PM
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-1461) Cached filter for a single term
field


   [ https://issues.apache.org/jira/browse/LUCENE-
1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=12650530#action_12650530 ]

Michael McCandless commented on LUCENE-1461:
--------------------------------------------

Should we absorb RangeMultiFilter into RangeFilter, and add a "setMethod"
to RangeFilter?

Someday, I think RangeFilter and RangeQuery should be implemented using
hierarchical ranges (there was a reference to a page in the wiki,
specifically about date range searching, recently), which would be another
method.

Cached filter for a single term field
-------------------------------------

               Key: LUCENE-1461
URL: https://issues.apache.org/jira/browse/ LUCENE-1461
           Project: Lucene - Java
        Issue Type: New Feature
          Reporter: Tim Sturge
       Attachments: DisjointMultiFilter.java, LUCENE-1461a.patch,
RangeMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java


These classes implement inexpensive range filtering over a field
containing a single term. They do this by building an integer array of
term numbers (storing the term->number mapping in a TreeMap) and then
implementing a fast integer comparison based DocSetIdIterator.
This code is currently being used to do age range filtering, but could
also be used to do other date filtering or in any application where there need to be multiple filters based on the same single term field. I have an untested implementation of single term filtering and have considered but
not yet implemented term set filtering (useful for location based
searches) as well.
The code here is fairly rough; it works but lacks javadocs and
toString() and hashCode() methods etc. I'm posting it here to discover if there is other interest in this feature; I don't mind fixing it up but would hate to go to the effort if it's not going to make it into Lucene.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to