[jira] Commented: (LUCENE-1673) Move TrieRange to core

Michael McCandless (JIRA) Tue, 02 Jun 2009 06:00:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715512#action_12715512
 ]


Michael McCandless commented on LUCENE-1673:
--------------------------------------------

bq. I want to move it to core before release of 2.9

+1!

bq. There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
should they be called in core?

I prefer to not use "trie" in the names (package and classes)... that
term very much describes what's under-the-hood in these classes (how
they are implemented), whereas I think [generally] names should
describe how the class is intended to be used.  So I prefer
"Long[Numeric]RangeQuery" over "LongTrieRangeQuery".

I'd also like to rename RangeQuery to something else, with this
change.  EG TermRangeQuery... to emphasize that you use it for
non-numbers.  The javadocs of TermRangeQuery should point to
Int/LongRangeQuery as strongly preferred for numeric ranges.

bq. Maybe the pairs of classes for indexing and searching should be moved into 
one class

I think separate classes for int, long, float, double is better.

bq. TrieUtils move into o.a.l.util? or document or?

Maybe document?

bq. Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
o.a.l.analysis.tokenattributes?

That sounds good?

bq. If we rename the classes, should Solr stay with Trie (because there are 
different impls)?

Well, Solr should decide ;)

But: why are there different impls for Solr?

bq. Maybe add a subclass of AbstractField, that automatically creates these 
TokenStreams and omits norms/tf per default for easier addition to Document 
instances?

+1

For a numeric field where one will sort or do range filtering, Trie*
ought to be the default.  But, unfortunately, the steps needed to make
use of Trie* are numerous:

  * Add your field to your doc with the LongTrieTokenStream

  * When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
    to your SortField

  * When you want to filter by range, instantiate
    LongTrieRangeFilter.  You'll have to subclass QueryParser to do
    this for the right fields.

  * When you want to display values, you must also pass the trie parser
     when populating the FieldCache

Ideally, one would simply use, say, LongNumericField (subclass of
AbstractField) at indexing time, Lucene would "remember" this
in the index (this is obviously missing today), and then when you
sort, retrieve value, and create queries from QueryParser, all these
places would "know" that this is a trie field and simply do the right
thing, by default.

(Aside: I just noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)


> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, 
> LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to 
> its default FieldTypes (SOLR-940) and if possible I want to move it to core 
> before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how 
> should they be called in core? I would suggest to leave it as it is. On the 
> other hand, if this keeps our only numeric query implementation, we could 
> call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here 
> are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into 
> one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The 
> problem here: ctors must be able to pass int, long, double, float as range 
> parameters. For the end user, mixing these 4 types in one class is hard to 
> handle. If somebody forgets to add a L to a long, it suddenly instantiates a 
> int version of range query, hitting no results and so on. Same with other 
> types. Maybe accept java.lang.Number as parameter (because nullable for 
> half-open bounds) and one enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into 
> o.a.l.analysis.tokenattributes? Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are 
> different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these 
> TokenStreams and omits norms/tf per default for easier addition to Document 
> instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1673) Move TrieRange to core

Reply via email to