[jira] Commented: (LUCENE-1673) Move TrieRange to core

Uwe Schindler (JIRA) Tue, 02 Jun 2009 06:41:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715529#action_12715529
 ]


Uwe Schindler commented on LUCENE-1673:
---------------------------------------

{quote}
(Aside: I just noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)
{quote}

I fixed this :-) Thanks!

{quote}
bq. If we rename the classes, should Solr stay with Trie (because there are 
different impls)?

Well, Solr should decide 

But: why are there different impls for Solr?
{quote}

I only added this here, to know, that Solr already started to implement this. 
In Solr there are three different impls:
- Trie (of course)
- Text-only numbers (do not work with range queries, but can be used for 
sorting etc.)
- A binary encoding (also used by LocalLucene at the moment), that is sortable. 
This can be used for RangeQueries, but sorting is slow (because they have no 
parser, and at the time it was implemented, SortField had no parser support)

The problem, because of backwards compatibility they need to be preserved 
(possibility to read old indexes).

bq. I think separate classes for int, long, float, double is better.

Two more... The problem, all these classes have exact the same impl internally 
and this is code duplication and hard to maintain. Maybe we use a static 
factory instead of same Ctor. By this the name is different, but it just 
creates the correct instance of always the same class: 
NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. 
Same for the TokenStreams (and the Field?)

{quote}
Ideally, one would simply use, say, LongNumericField (subclass of
AbstractField) at indexing time, Lucene would "remember" this
in the index (this is obviously missing today), and then when you
sort, retrieve value, and create queries from QueryParser, all these
places would "know" that this is a trie field and simply do the right
thing, by default.
{quote}

For that we need the type information in the index and for that the new 
Field/Document classes. Hopefully Michael will get this working soonly.

{quote}
When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
to your SortField 
{quote}

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards 
compatibility. Ideally we would just create numeric fields in the new way and 
reuse e.g. SortField.INT for this. But this cannot be done. Or even, replace 
the FieldCache parsers by the trie ones. But this cannot be done at the moment.

{quote}
I'd also like to rename RangeQuery to something else, with this
change. EG TermRangeQuery... to emphasize that you use it for
non-numbers. The javadocs of TermRangeQuery should point to
Int/LongRangeQuery as strongly preferred for numeric ranges.
{quote}

Cool. For the others, too (FieldCacheRangeQuery).

There is a lot more to decide, I will keep this issue open a little bit before 
starting to work to collect ideas!

> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, 
> LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
> its default FieldTypes (SOLR-940) and if possible I want to move it to core 
> before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
> should they be called in core? I would suggest to leave it as it is. On the 
> other hand, if this keeps our only numeric query implementation, we could 
> call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
> are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into 
> one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
> problem here: ctors must be able to pass int, long, double, float as range 
> parameters. For the end user, mixing these 4 types in one class is hard to 
> handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
> int version of range query, hitting no results and so on. Same with other 
> types. Maybe accept java.lang.Number as parameter (because nullable for 
> half-open bounds) and one enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
> o.a.l.analysis.tokenattributes? Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are 
> different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these 
> TokenStreams and omits norms/tf per default for easier addition to Document 
> instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1673) Move TrieRange to core

Reply via email to