[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

Michael McCandless (JIRA) Mon, 29 Aug 2011 09:15:02 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092948#comment-13092948
 ]


Michael McCandless commented on LUCENE-3312:
--------------------------------------------

bq. Due to the fact that FieldInfo is maintained per field name, if an 
IndexableField and StorableField are added to a Document separately but with 
the same name, a single FieldInfo will be created noting the field is both 
indexed and stored. This isn't a problem, however a lot of code used to 
leverage this fact to get metadata about indexed Fields using 
searcher.document(docId). They would retrieve all the stored fields and then 
see which were also indexed (and associated metadata). This seems like a bit of 
a hack, piggybacking stored fields to find out about their indexing attributes. 
So I guess it cannot continue to go forward? When you pull the StorableFields, 
you should only be able to access the stored value metadata?

Right, this has been a long standing problem w/ the Document class you
load at search time, ie the fields "pretend" to carry over the
details from indexing.  But it's buggy now, eg boost is not carried
over, and the indexed bit is "global" (comes from field info) while
the "tokenized" bit used to be per-doc, before LUCENE-2308.

So I consider this (these indexing details are no longer available
when you pull the document) a big benefit of cutting over to
StorableField.  Ie, its trappy today since it's buggy, so we'd be
removing that trap.

bq. By creating this separation, we will need some notion of a Document in 
index.* which provides Iterable access to both the IndexableFields and 
StorableFields. As such, Document itself is becoming more userland. However by 
letting it store Indexable and StorableFields separately, the functionality it 
provides (getBinaryValue for example) becomes quite verbose because it must 
provide an implementations of both kinds of fields. Given that Field is a 
userland implementation of both Indexable and StorableField, should Document 
work solely with Fields? or should we allow people to register both kinds of 
fields separately and just have a verbose set of functionality?

Good question... I think the userland "Field" (oal.document) should
implement both IndexableField and StorableField?  And then
oal.document.Document holds Field instances?

Maybe we can name the new class oal.index.Indexable?  It's a trivial
class, just exposing .indexableFieldsIterator and
.storableFieldsIterator?


> Break out StorableField from IndexableField
> -------------------------------------------
>
>                 Key: LUCENE-3312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3312
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>             Fix For: Field Type branch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API....

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

Reply via email to