[
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698215#action_12698215
]
Earwin Burrfoot commented on LUCENE-831:
----------------------------------------
I'm using a similar approach.
There's a FieldType, that governs conversions from Java type into Lucene
strings and declares 'abilities' of that type. Like - conversion is
order-preserving (all numerics + some others), converted values can be
meaningfully prefix-searched (like TreeId, that is essentially an int[], used
to represent things like nested category trees). Some types can also declare
themselves as derivatives of others, like DateType being derived from LongType.
Then there's a FieldInfo, that defines field name, FieldType used for it, and
actions we're going to take on the field. E.g. if we want to sort on it, build
clusters with certain characteristics, load values for this field for each
found document, use fast rangefilters, store/filter on field being
null/notnull, apply transforms on the field before storing/searching, copy
value of the field to another field (with probable transformation) when
indexing, etc. From FieldType and desired actions, FieldInfo is able to deduce
tokenize/index/store/cache behaviour, and can say that additional lucene fields
are required (e.g. for handling null/notnull searches, or trie ranges, or a
special sort-form).
Then there's an interface that contains FieldInfo constants and a special
constant FieldEnum FIELDS = fieldsOf(ResumeFields.class); that is essentially a
navigable list of all FieldInfos defined in this interface and interfaces it
extends (allows me to have CommonFields + ResumeFields extends CommonFields,
VacancyFields extends CommonFields).
FieldType, and consequently FieldInfo is type-parameterized with the java type
associated with the field, so you get the benefit of type-safety when
storing/loading/searching the field. All
Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for
field name, so for example Filters.Range(field, fromValue, fromInclusive,
toValue, toInclusive) knows whether to use a simple range filter or a trie one,
ensures from/toValues are of a proper type and converts them properly.
Filters.IsSet(field) can consult an additional field created during indexation,
or access a FieldCache. DocLoader will either get a value for the field from
index or from the cache. etc, etc, etc.
While I like resulting schema-style very much, I don't want to see the likes of
it within Lucene core. Better to have some contrib/extension/whatever that
builds on core-defined primitives. That way if one needs to build his own
somewhat divergent schema, they can easily do it, instead of trying to fit
theirs over Lucene's. For the very same reason I'd like to see fieldcaches
moved away from the core, and depending on the same in-core IndexReader segment
creation/deletion/whatever hooks that users will use to build their extensions.
> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Hoss Man
> Assignee: Mark Miller
> Fix For: 3.0
>
> Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff,
> fieldcache-overhaul.diff, fieldcache-overhaul.diff,
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff,
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch,
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed.
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]