[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Uwe Schindler (JIRA) Sun, 22 Feb 2009 11:19:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675695#action_12675695
 ]


Uwe Schindler commented on LUCENE-1541:
---------------------------------------

After thinking one night longer about the whole issue, I suspect, that 
non-equidistant precision steps are really needed:

Ning's comment is correct about the number of terms. But if you only index long 
values from, e.g., 0L to 10000L, you create a lot of terms for shift values 0 
to 14 (because the terms are in this range). For shift values 15 to 63, the 
term is always the same constant term. The index' TermEnum so contains not many 
additional values (because its only *one* term for *all* documents), only 
additional TermDocs entries are created. It's the same like adding one "flag" 
term to all documents. This does not use much additional space in index. When 
you query a range, these terms are never used, but they do not hurt.

The additional space for the trie terms is generated by higher precision (lower 
shift) values. If you index with precision step 4 or 2 instead of a precision 
step of 8, you create a lot of *different* terms for the lower shift values. 
The constant terms in the higher shifts are still always the same and does not 
consume much space.

I will create a small comparison on index size for long values without higher 
bits, but I suspect, that index size without lower precision terms reduces 
space significant. If this is the case, I do not think the additional 
complexity of the API is needed for this low impact. If somebody really wants 
to optimize index size so much, he can create a optimized fork of TrieRange in 
his project that indexes with non-equidistant precision steps. On the other 
hand, I would suggest to use ints/floats instead of longs/doubles, if only 
lower precision is needed. In this case, less terms will be created. For floats 
in comparision to doubles, the effect will be bigger (because even small 
doubles use a lot of hgher precision bits), a non-equidistant precision step 
will not help very much here.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is 
> specified. With a large precision step (say 8), a value is indexed in fewer 
> terms (8) but the number of terms for a range can be large. With a small 
> precision step (say 2), the number of terms for a range is smaller but a 
> value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for 
> different precisions. An expert can use this option to keep the number of 
> terms for a range small and at the same time index a value in a small number 
> of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Reply via email to