[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Uwe Schindler (JIRA) Tue, 02 Dec 2008 04:17:18 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652348#action_12652348
 ]


Uwe Schindler commented on LUCENE-1470:
---------------------------------------

It is almost complete. In my opinion the only change would be the setting of 
defaults. I wanted to move this into TrieUtils directly. Let me iterate one 
more patch and then we could commit. As I currently have no contributor status, 
it is simplier to first iterate the patch enough before committing. I was 
waiting for any additional comments before releasing a new patch version.

I did not had time to do some benchmarks, so testing the three trie variants 
for speed/disk io/indexing speed is not yet done. My current benchmarks affect 
only the performance of my "old" trie code, not the new one using the more 
binary encoding. I asked for a good "benchmarking framework", is 
contrib/benchmark useful for that? For benchmarking you need to create rather 
big indexes, maybe containing random numeric values. So the benchmark may also 
need much disk space (1 Gig?), ok you can leave out stored fields and 
additional full text fields, but the benchamrk should at last have also a 
normal tokenized field for performance with combining trie / ordinal term 
queries (like in the paper given by Nadav).

Do you think, it is good to directly include it into contrib-search? In my 
opinion, it is, but maybe others think different.

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, 
> LUCENE-1470.patch
>
>
> According to the thread in java-dev 
> (http://www.gossamer-threads.com/lists/lucene/java-dev/67807 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to 
> include my fast numerical range query implementation into lucene 
> contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

Reply via email to