[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Thu, 27 Nov 2008 02:22:50 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1470:
----------------------------------

    Attachment: LUCENE-1470.patch

Just a update of the patch, before I want to implement Paul's suggestions on 
trie factor.

The current patch has some performance improvements by avoid seeking back and 
forth in IndexReader's TermEnum. IndexReader's TermEnum may also better use 
caching. The parts of the range, that use Terms, that come earlier in the 
TermEnum are done first. The order is now:
- Highest precision (because the field name of highest precision is not 
suffixed, and so the terms come earlier, "fieldname"<"fieldname#trie")
- Lower precision starting with the lowest one. The prefix of the lowest 
precision is the smallest (0x20) and goes up to 0x27

When looking into SegmentTermEnum's code, I realized that 
IndexReader.terms(Term t) is faster than only getting the complete TermEnum and 
then seekTo(Term). Why this difference? I first wanted to change my code to use 
only one instance (like with the TermDocs) of the TermEnum through the wohle 
range split prcess and seek the enum for each range, but I dropped that change.

Further improvements of that patch are more comments and a cleaner (and more 
elegant) code in TrieUtils (without using a StringBuffer).

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to