[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Earwin Burrfoot (JIRA) Sun, 12 Apr 2009 04:48:38 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698215#action_12698215
 ]


Earwin Burrfoot commented on LUCENE-831:
----------------------------------------

I'm using a similar approach.

There's a FieldType, that governs conversions from Java type into Lucene 
strings and declares 'abilities' of that type. Like - conversion is 
order-preserving (all numerics + some others), converted values can be 
meaningfully prefix-searched (like TreeId, that is essentially an int[], used 
to represent things like nested category trees). Some types can also declare 
themselves as derivatives of others, like DateType being derived from LongType.

Then there's a FieldInfo, that defines field name, FieldType used for it, and 
actions we're going to take on the field. E.g. if we want to sort on it, build 
clusters with certain characteristics, load values for this field for each 
found document, use fast rangefilters, store/filter on field being 
null/notnull, apply transforms on the field before storing/searching, copy 
value of the field to another field (with probable transformation) when 
indexing, etc. From FieldType and desired actions, FieldInfo is able to deduce 
tokenize/index/store/cache behaviour, and can say that additional lucene fields 
are required (e.g. for handling null/notnull searches, or trie ranges, or a 
special sort-form).

Then there's an interface that contains FieldInfo constants and a special 
constant FieldEnum FIELDS = fieldsOf(ResumeFields.class); that is essentially a 
navigable list of all FieldInfos defined in this interface and interfaces it 
extends (allows me to have CommonFields + ResumeFields extends CommonFields, 
VacancyFields extends CommonFields).

FieldType, and consequently FieldInfo is type-parameterized with the java type 
associated with the field, so you get the benefit of type-safety when 
storing/loading/searching the field. All 
Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for 
field name, so for example Filters.Range(field, fromValue, fromInclusive, 
toValue, toInclusive) knows whether to use a simple range filter or a trie one, 
ensures from/toValues are of a proper type and converts them properly. 
Filters.IsSet(field) can consult an additional field created during indexation, 
or access a FieldCache. DocLoader will either get a value for the field from 
index or from the cache. etc, etc, etc.

While I like resulting schema-style very much, I don't want to see the likes of 
it within Lucene core. Better to have some contrib/extension/whatever that 
builds on core-defined primitives. That way if one needs to build his own 
somewhat divergent schema, they can easily do it, instead of trying to fit 
theirs over Lucene's. For the very same reason I'd like to see fieldcaches 
moved away from the core, and depending on the same in-core IndexReader segment 
creation/deletion/whatever hooks that users will use to build their extensions. 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
> LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Reply via email to