[
https://issues.apache.org/jira/browse/LUCENE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer resolved LUCENE-2935.
-------------------------------------
Resolution: Fixed
the main infrastructure has been committed to the docvalues branch - moving out
here
> Let Codec consume entire document
> ---------------------------------
>
> Key: LUCENE-2935
> URL: https://issues.apache.org/jira/browse/LUCENE-2935
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/codecs, core/index
> Affects Versions: CSF branch, 4.0
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Fix For: CSF branch, 4.0
>
>
> Currently the codec API is limited to consume Terms & Postings upon a segment
> flush. To enable stored fields & DocValues to make use of the Codec
> abstraction codecs should allow to pull a consumer ahead of flush time and
> consume all values from a document's field though a consumer API. An
> alternative to consuming the entire document would be extending
> FieldsConsumer to return a StoredValueConsumer / DocValuesConsumer like it is
> done in DocValues - Branch right now side by side to the TermsConsumer. Yet,
> extending this has proven to be very tricky and error prone for several
> reasons:
> * FieldsConsumer requires SegmentWriteState which might be different upon
> flush compared to when the document is consumed. SegmentWriteState must
> therefor be created twice 1. when the first docvalues field is indexed 2.
> when flushed.
> * FieldsConsumer are current pulled for each indexed field no matter if there
> are terms to be indexed or not. Yet, if we use something like DocValuesCodec
> which essentially wraps another codec and creates FieldConsumer on demand the
> wrapped codecs consumer might not be initialized even if the field is
> indexed. This causes problems once such a field is opened but missing the
> required files for that codec. I added some harsh logic to work around this
> which should be prevented.
> * SegmentCodecs are created for each SegmentWriteState which might yield
> wrong codec IDs depending on how fields numbers are assigned. We currently
> depend on the fact that all fields for a segment and therefore their codecs
> are known when SegmentCodecs are build. To enable consuming perDoc values in
> codecs we need to do that incrementally
> Codecs should instead provide a DocumentConsumer side by side with the
> FieldsConsumer created prior to flush. This is also a prerequisite for
> LUCENE-2621
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]