[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Wed, 03 Dec 2008 10:52:47 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1470:
----------------------------------

    Attachment: LUCENE-1470.patch

New Patch, I think this is really ready to commit. Includes Mike's suggestion 
to compare the result count of RangeQuery with TrieRangeQuery with random 
values. Also contains a further test on result count of ranges with an index 
containing values with distance=1. This is to detect errors, when the code 
generating the splitted range may fail to correctly attach the range parts to 
each other.

During changing my own project panFMP to use the new contrib package, I needed 
a function to read stored, trie-encoded values for re-indexing with another 
trie variant. The new static methods in TrieUtils choose the variant for 
decoding using the encoded string length. These static methods can be used for 
easy decoding of stored fields that use the trie encoding without knowing the 
encoding. For range queries, the encoding cannot be autodetected (which is 
clear).

Further work maybe a optimized sort algorith for the trie encoded fields. 
Current FieldCache cannot handle them as longs, only as Strings which is memory 
intensive. Having a FieldCache implementation for longs that support this 
encoding would be good. I do not want to do this with the current unflexible 
FieldCache implementation, so I wait for LUCENE-831. Has anybody an idea, how 
to plugin the correct FieldCache impl for this encoding in current Lucene, 
respecting the TrieUtils variant?

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, 
> LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to