markharw00d wrote:
>>(a "write once" schema)
I like this idea. Enforcing consistent field-typing on instances of
fields with the same name does not seem like an unreasonable
restriction - especially given the upsides to this.
And also when it's "opt-in", ie, you can continue to use untyped/
unrestricted fields.
One note of caution is that users may be tempted to store primary
keys as ints or longs and incur the overhead of trie encoding when
there is no use case for range queries on these types of fields.
I've often thought of field types as belonging to these broad
categories (rather than ints/strings/longs etc):
1) Quantifiers - used to express numeric quantities that are often
queried in ranges e.g. price, datetime, longitude
2) Identifiers - designed to be unique e.g. urls, primary keys
3) Controlled vocabularies - enums e.g. male/female or public/private
4) Uncontrolled vocabularies - e.g free text
"Ints" can be used to represent types 1 to 3 but the practical uses
of them differ (range queries vs straight-look ups vs faceted groups)
It seems like the likely use cases are more important than the raw
data format (int vs long etc)
I like working by usage instead of underlying type, and I like this
breakdown. It would allow us to do a better job defaulting the
settings for these fields types.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org