[ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715529#action_12715529 ]
Uwe Schindler commented on LUCENE-1673: --------------------------------------- {quote} (Aside: I just noticed the code fragment in the javadocs for LongTrieTokenStream won't compile, because the setValue method is not available for TokenStream; the stream should be defined as LongTrieTokenStream, I think?; same with IntTrieTokenStream) {quote} I fixed this :-) Thanks! {quote} bq. If we rename the classes, should Solr stay with Trie (because there are different impls)? Well, Solr should decide But: why are there different impls for Solr? {quote} I only added this here, to know, that Solr already started to implement this. In Solr there are three different impls: - Trie (of course) - Text-only numbers (do not work with range queries, but can be used for sorting etc.) - A binary encoding (also used by LocalLucene at the moment), that is sortable. This can be used for RangeQueries, but sorting is slow (because they have no parser, and at the time it was implemented, SortField had no parser support) The problem, because of backwards compatibility they need to be preserved (possibility to read old indexes). bq. I think separate classes for int, long, float, double is better. Two more... The problem, all these classes have exact the same impl internally and this is code duplication and hard to maintain. Maybe we use a static factory instead of same Ctor. By this the name is different, but it just creates the correct instance of always the same class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. Same for the TokenStreams (and the Field?) {quote} Ideally, one would simply use, say, LongNumericField (subclass of AbstractField) at indexing time, Lucene would "remember" this in the index (this is obviously missing today), and then when you sort, retrieve value, and create queries from QueryParser, all these places would "know" that this is a trie field and simply do the right thing, by default. {quote} For that we need the type information in the index and for that the new Field/Document classes. Hopefully Michael will get this working soonly. {quote} When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER to your SortField {quote} Or add new SortField types. The problem with all this: For old indexes, we need some backwards compatibility. Ideally we would just create numeric fields in the new way and reuse e.g. SortField.INT for this. But this cannot be done. Or even, replace the FieldCache parsers by the trie ones. But this cannot be done at the moment. {quote} I'd also like to rename RangeQuery to something else, with this change. EG TermRangeQuery... to emphasize that you use it for non-numbers. The javadocs of TermRangeQuery should point to Int/LongRangeQuery as strongly preferred for numeric ranges. {quote} Cool. For the others, too (FieldCacheRangeQuery). There is a lot more to decide, I will keep this issue open a little bit before starting to work to collect ideas! > Move TrieRange to core > ---------------------- > > Key: LUCENE-1673 > URL: https://issues.apache.org/jira/browse/LUCENE-1673 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 2.9 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 2.9 > > > TrieRange was iterated many times and seems stable now (LUCENE-1470, > LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to > its default FieldTypes (SOLR-940) and if possible I want to move it to core > before release of 2.9. > Before this can be done, there are some things to think about: > # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how > should they be called in core? I would suggest to leave it as it is. On the > other hand, if this keeps our only numeric query implementation, we could > call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here > are problems). Same for the TokenStreams and Filters. > # Maybe the pairs of classes for indexing and searching should be moved into > one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The > problem here: ctors must be able to pass int, long, double, float as range > parameters. For the end user, mixing these 4 types in one class is hard to > handle. If somebody forgets to add a L to a long, it suddenly instantiates a > int version of range query, hitting no results and so on. Same with other > types. Maybe accept java.lang.Number as parameter (because nullable for > half-open bounds) and one enum for the type. > # TrieUtils move into o.a.l.util? or document or? > # Move TokenStreams into o.a.l.analysis, ShiftAttribute into > o.a.l.analysis.tokenattributes? Somewhere else? > # If we rename the classes, should Solr stay with Trie (because there are > different impls)? > # Maybe add a subclass of AbstractField, that automatically creates these > TokenStreams and omits norms/tf per default for easier addition to Document > instances? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org