[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092948#comment-13092948 ]
Michael McCandless commented on LUCENE-3312: -------------------------------------------- bq. Due to the fact that FieldInfo is maintained per field name, if an IndexableField and StorableField are added to a Document separately but with the same name, a single FieldInfo will be created noting the field is both indexed and stored. This isn't a problem, however a lot of code used to leverage this fact to get metadata about indexed Fields using searcher.document(docId). They would retrieve all the stored fields and then see which were also indexed (and associated metadata). This seems like a bit of a hack, piggybacking stored fields to find out about their indexing attributes. So I guess it cannot continue to go forward? When you pull the StorableFields, you should only be able to access the stored value metadata? Right, this has been a long standing problem w/ the Document class you load at search time, ie the fields "pretend" to carry over the details from indexing. But it's buggy now, eg boost is not carried over, and the indexed bit is "global" (comes from field info) while the "tokenized" bit used to be per-doc, before LUCENE-2308. So I consider this (these indexing details are no longer available when you pull the document) a big benefit of cutting over to StorableField. Ie, its trappy today since it's buggy, so we'd be removing that trap. bq. By creating this separation, we will need some notion of a Document in index.* which provides Iterable access to both the IndexableFields and StorableFields. As such, Document itself is becoming more userland. However by letting it store Indexable and StorableFields separately, the functionality it provides (getBinaryValue for example) becomes quite verbose because it must provide an implementations of both kinds of fields. Given that Field is a userland implementation of both Indexable and StorableField, should Document work solely with Fields? or should we allow people to register both kinds of fields separately and just have a verbose set of functionality? Good question... I think the userland "Field" (oal.document) should implement both IndexableField and StorableField? And then oal.document.Document holds Field instances? Maybe we can name the new class oal.index.Indexable? It's a trivial class, just exposing .indexableFieldsIterator and .storableFieldsIterator? > Break out StorableField from IndexableField > ------------------------------------------- > > Key: LUCENE-3312 > URL: https://issues.apache.org/jira/browse/LUCENE-3312 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index > Reporter: Michael McCandless > Fix For: Field Type branch > > > In the field type branch we have strongly decoupled > Document/Field/FieldType impl from the indexer, by having only a > narrow API (IndexableField) passed to IndexWriter. This frees apps up > use their own "documents" instead of the "user-space" impls we provide > in oal.document. > Similarly, with LUCENE-3309, we've done the same thing on the > doc/field retrieval side (from IndexReader), with the > StoredFieldsVisitor. > But, maybe we should break out StorableField from IndexableField, > such that when you index a doc you provide two Iterables -- one for the > IndexableFields and one for the StorableFields. Either can be null. > One downside is possible perf hit for fields that are both indexed & > stored (ie, we visit them twice, lookup their name in a hash twice, > etc.). But the upside is a cleaner separation of concerns in API.... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org