[jira] Commented: (LUCENE-1673) Move TrieRange to core

Michael McCandless (JIRA) Tue, 09 Jun 2009 10:22:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717754#action_12717754
 ]


Michael McCandless commented on LUCENE-1673:
--------------------------------------------

{quote}
In Solr there are three different impls:

Trie (of course)
Text-only numbers (do not work with range queries, but can be used for sorting 
etc.)
A binary encoding (also used by LocalLucene at the moment), that is sortable. 
This can be used for RangeQueries, but sorting is slow (because they have no 
parser, and at the time it was implemented, SortField had no parser support)
{quote}

Ahh OK, this is just Solr's pre-existing numeric field support.  (I
had thought you meant Solr had a different impl for Trie).

bq. The problem, because of backwards compatibility they need to be preserved 
(possibility to read old indexes).

This is indeed quite a challenge.  Actually is there anything in Trie
that encodes which version of the format is indexed in a given
segment?  (So that if we do every change the indexed format, we can
bump a version somewhere to keep back compat).

bq. Maybe we use a static factory instead of same Ctor. By this the name is 
different, but it just creates the correct instance of always the same class: 
NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. 
Same for the TokenStreams (and the Field?)

That sounds like a good approach?

{quote}
> When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
> to your SortField

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards 
compatibility. Ideally we would just create numeric fields in the new way and 
reuse e.g. SortField.INT for this. But this cannot be done. Or even, replace 
the FieldCache parsers by the trie ones. But this cannot be done at the moment.
{quote}

I wonder if we could handle this by adding a setting in FieldInfo?
Ie, to record that "this numeric field was indexed as a trie".  Then,
when we need to get the parser for SortField.INT, we'd check the
FieldInfo to see which parser to use.  This could also handle
back-compat, ie if we change the trie format being written we'd change
the setting and segment merging would gradually uprade previously
indexed fields.

{quote}
> I'd also like to rename RangeQuery to something else, with this
> change. EG TermRangeQuery... to emphasize that you use it for
> non-numbers. The javadocs of TermRangeQuery should point to
> Int/LongRangeQuery as strongly preferred for numeric ranges.

Cool. For the others, too (FieldCacheRangeQuery).
{quote}

Yes.


> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, 
> LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
> its default FieldTypes (SOLR-940) and if possible I want to move it to core 
> before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
> should they be called in core? I would suggest to leave it as it is. On the 
> other hand, if this keeps our only numeric query implementation, we could 
> call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
> are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into 
> one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
> problem here: ctors must be able to pass int, long, double, float as range 
> parameters. For the end user, mixing these 4 types in one class is hard to 
> handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
> int version of range query, hitting no results and so on. Same with other 
> types. Maybe accept java.lang.Number as parameter (because nullable for 
> half-open bounds) and one enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
> o.a.l.analysis.tokenattributes? Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are 
> different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these 
> TokenStreams and omits norms/tf per default for easier addition to Document 
> instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1673) Move TrieRange to core

Reply via email to