> I wonder if we could handle this by adding a setting in FieldInfo? Do we have an issue open that allows any metadata on a per field basis? This seems like something flexible indexing will require?
On Tue, Jun 9, 2009 at 10:15 AM, Michael McCandless (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717754#action_12717754] > > Michael McCandless commented on LUCENE-1673: > -------------------------------------------- > > {quote} > In Solr there are three different impls: > > Trie (of course) > Text-only numbers (do not work with range queries, but can be used for > sorting etc.) > A binary encoding (also used by LocalLucene at the moment), that is > sortable. This can be used for RangeQueries, but sorting is slow (because > they have no parser, and at the time it was implemented, SortField had no > parser support) > {quote} > > Ahh OK, this is just Solr's pre-existing numeric field support. (I > had thought you meant Solr had a different impl for Trie). > > bq. The problem, because of backwards compatibility they need to be > preserved (possibility to read old indexes). > > This is indeed quite a challenge. Actually is there anything in Trie > that encodes which version of the format is indexed in a given > segment? (So that if we do every change the indexed format, we can > bump a version somewhere to keep back compat). > > bq. Maybe we use a static factory instead of same Ctor. By this the name is > different, but it just creates the correct instance of always the same > class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and > so on. Same for the TokenStreams (and the Field?) > > That sounds like a good approach? > > {quote} > > When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER > > to your SortField > > Or add new SortField types. > > The problem with all this: For old indexes, we need some backwards > compatibility. Ideally we would just create numeric fields in the new way > and reuse e.g. SortField.INT for this. But this cannot be done. Or even, > replace the FieldCache parsers by the trie ones. But this cannot be done at > the moment. > {quote} > > I wonder if we could handle this by adding a setting in FieldInfo? > Ie, to record that "this numeric field was indexed as a trie". Then, > when we need to get the parser for SortField.INT, we'd check the > FieldInfo to see which parser to use. This could also handle > back-compat, ie if we change the trie format being written we'd change > the setting and segment merging would gradually uprade previously > indexed fields. > > {quote} > > I'd also like to rename RangeQuery to something else, with this > > change. EG TermRangeQuery... to emphasize that you use it for > > non-numbers. The javadocs of TermRangeQuery should point to > > Int/LongRangeQuery as strongly preferred for numeric ranges. > > Cool. For the others, too (FieldCacheRangeQuery). > {quote} > > Yes. > > > > Move TrieRange to core > > ---------------------- > > > > Key: LUCENE-1673 > > URL: https://issues.apache.org/jira/browse/LUCENE-1673 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: Search > > Affects Versions: 2.9 > > Reporter: Uwe Schindler > > Assignee: Uwe Schindler > > Fix For: 2.9 > > > > > > TrieRange was iterated many times and seems stable now (LUCENE-1470, > LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to > its default FieldTypes (SOLR-940) and if possible I want to move it to core > before release of 2.9. > > Before this can be done, there are some things to think about: > > # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how > should they be called in core? I would suggest to leave it as it is. On the > other hand, if this keeps our only numeric query implementation, we could > call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here > are problems). Same for the TokenStreams and Filters. > > # Maybe the pairs of classes for indexing and searching should be moved > into one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter. > The problem here: ctors must be able to pass int, long, double, float as > range parameters. For the end user, mixing these 4 types in one class is > hard to handle. If somebody forgets to add a L to a long, it suddenly > instantiates a int version of range query, hitting no results and so on. > Same with other types. Maybe accept java.lang.Number as parameter (because > nullable for half-open bounds) and one enum for the type. > > # TrieUtils move into o.a.l.util? or document or? > > # Move TokenStreams into o.a.l.analysis, ShiftAttribute into > o.a.l.analysis.tokenattributes? Somewhere else? > > # If we rename the classes, should Solr stay with Trie (because there are > different impls)? > > # Maybe add a subclass of AbstractField, that automatically creates these > TokenStreams and omits norms/tf per default for easier addition to Document > instances? > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >