[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Uwe Schindler (JIRA) Mon, 22 Jun 2009 12:40:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722774#action_12722774
 ]


Uwe Schindler edited comment on LUCENE-1701 at 6/22/09 12:38 PM:
-----------------------------------------------------------------

bq. Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

I think 4 for ints is a good start, better as 4 for longs (which produces 16 
different precision terms and upto 31 term enums [= precision changes] per 
range). 6 is a good idea, it brings a little bit more than 8 but does not 
produce too much precision changes. I tested that also with my 2 M numeric-only 
index here.

Mike: As you see, the precision step is a good config approach, so an default 
is should be choosen carefully.
It may even be different for the same data type, when e.g. you have longs, but 
all longs in your index are only in a very limited range -- ok. You could use 
an int, too. But e.g. if you index dates as long and your dates are only 
between two years or something like that, 4 may still good. This is because on 
a smaller range, the algorith does not need to to up to the lowest precision.

bq. Though, if you want really fast dates, chosing hour/day/month/year as 
precision steps is vastly superior, plus it also clicks well with user-selected 
ranges. Still, I dumped this approach for uniformity and clarity.

That is clear. Because these precisions are fitting exact to users queries in 
case of dates (often users take full days when selecting the range).

Nice to hear, that you use TrieRange? What is your index spec and measured 
query speeds (if it does not go too far into company internals)?

      was (Author: thetaphi):
    bq. Using 4 for int, 6 for long. Dates-as-longs look a bit sad on 8.

I think 4 for ints is a good start, better as 4 for longs (which produces 16 
different precision terms and upto 31 term enums [= precision changes] per 
range). 6 is a good idea, it brings a little bit more than 8 but does not 
produce too much precision changes. I tested that also with my 2 M numeric-only 
index here.

Mike: As you see, the precision step is a good config approach, so an default 
is shpould be choosen carefully.
It may even be different for different data types, when e.g. you have longs, 
but all longs in your index are only in a very limited range -- ok. You could 
use an int, too. But e.g. if you index dates as long and your dates are only 
between two years or something like that, 4 may still good. This is because on 
a smaller range, the algorith does not need to to up to the lowest precision.

bq. Though, if you want really fast dates, chosing hour/day/month/year as 
precision steps is vastly superior, plus it also clicks well with user-selected 
ranges. Still, I dumped this approach for uniformity and clarity.

That is clear. Because these precisions are fitting exact to users queries in 
case of dates (often users take full days when selecting the range).

Nice to hear, that you use TrieRange? What is your index spec and measured 
query speeds (if it does not go too far into company internals)?
  
> Add NumericField and NumericSortField, make plain text numeric parsers public 
> in FieldCache, move trie parsers to FieldCache
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1701
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1701
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index, Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1701-test-tag-special.patch, LUCENE-1701.patch, 
> LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, LUCENE-1701.patch, 
> LUCENE-1701.patch, NumericField.java
>
>
> In discussions about LUCENE-1673, Mike & me wanted to add a new NumericField 
> to o.a.l.document specific for easy indexing. An alternative would be to add 
> a NumericUtils.newXxxField() factory, that creates a preconfigured Field 
> instance with norms and tf off, optionally a stored text (LUCENE-1699) and 
> the TokenStream already initialized. On the other hand 
> NumericUtils.newXxxSortField could be moved to NumericSortField.
> I and Yonik tend to use the factory for both, Mike tends to create the new 
> classes.
> Also the parsers for string-formatted numerics are not public in FieldCache. 
> As the new SortField API (LUCENE-1478) makes it possible to support a parser 
> in SortField instantiation, it would be good to have the static parsers in 
> FieldCache public available. SortField would init its member variable to them 
> (instead of NULL), so making code a lot easier (FieldComparator has this ugly 
> null checks when retrieving values from the cache).
> Moving the Trie parsers also as static instances into FieldCache would make 
> the code cleaner and we would be able to hide the "hack" 
> StopFillCacheException by making it private to FieldCache (currently its 
> public because NumericUtils is in o.a.l.util).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-1701) Add NumericField and NumericSortField, make plain text numeric parsers public in FieldCache, move trie parsers to FieldCache

Reply via email to