I would love to see this find it's way into Lucene!
It puts a logarithmic bound (I think?) on the number of terms whose
docs must be visited to implement the range query, at the expense
of an increase in index size for those fields that need range search.
For large indexes this should make an incredible difference on
RangeQuery performance.
Mike
Uwe Schindler wrote:
I do not want to attach this to the issue report, just for
completeness:
I implemented (based on RangeFilter) another approach for faster
RangeQueries, based on longs stored in index in a special format.
The idea behind this is to store the longs in different precision in
index
and partition the query range in such a way, that the outer
boundaries are
search using terms from the highest precision, but the center of the
search
Range with lower precision. The implementation stores the longs in 8
different precisions (using a class called TrieUtils). It also has
support
for Doubles, using the IEEE 754 floating-point "double format" bit
layout
with some bit mappings to make them binary sortable. The approach is
used in
rather big indexes, query times are even on low performance desktop
computers <<100 ms (!) for very big ranges on indexes with 500000
docs.
I called this RangeQuery variant and format "TrieRangeRange" query
because
the idea looks like the well-known Trie structures (but it is not
identical
to real tries, but algorithms are related to it).
I was thinking about supplying this to Lucene contrib, any thoughts
about
it?
More infos from the JavaDoc of the project:
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/search/TrieRangeQuer
y.html
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/utils/TrieUtils.html
Source:
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/search/TrieRangeQuery.java?view=markup
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/utils/TrieUtils.java?view=markup
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]
From: Michael McCandless (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 12:14 PM
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-1461) Cached filter for a single
term
field
[ https://issues.apache.org/jira/browse/LUCENE-
1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=12650530#action_12650530 ]
Michael McCandless commented on LUCENE-1461:
--------------------------------------------
Should we absorb RangeMultiFilter into RangeFilter, and add a
"setMethod"
to RangeFilter?
Someday, I think RangeFilter and RangeQuery should be implemented
using
hierarchical ranges (there was a reference to a page in the wiki,
specifically about date range searching, recently), which would be
another
method.
Cached filter for a single term field
-------------------------------------
Key: LUCENE-1461
URL: https://issues.apache.org/jira/browse/
LUCENE-1461
Project: Lucene - Java
Issue Type: New Feature
Reporter: Tim Sturge
Attachments: DisjointMultiFilter.java, LUCENE-1461a.patch,
RangeMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
These classes implement inexpensive range filtering over a field
containing a single term. They do this by building an integer array
of
term numbers (storing the term->number mapping in a TreeMap) and then
implementing a fast integer comparison based DocSetIdIterator.
This code is currently being used to do age range filtering, but
could
also be used to do other date filtering or in any application where
there
need to be multiple filters based on the same single term field. I
have an
untested implementation of single term filtering and have
considered but
not yet implemented term set filtering (useful for location based
searches) as well.
The code here is fairly rough; it works but lacks javadocs and
toString() and hashCode() methods etc. I'm posting it here to
discover if
there is other interest in this feature; I don't mind fixing it up
but
would hate to go to the effort if it's not going to make it into
Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]