[ 
https://issues.apache.org/jira/browse/LUCENE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2935.
-------------------------------------

    Resolution: Fixed

the main infrastructure has been committed to the docvalues branch - moving out 
here

> Let Codec consume entire document
> ---------------------------------
>
>                 Key: LUCENE-2935
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2935
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/codecs, core/index
>    Affects Versions: CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: CSF branch, 4.0
>
>
> Currently the codec API is limited to consume Terms & Postings upon a segment 
> flush. To enable stored fields & DocValues to make use of the Codec 
> abstraction codecs should allow to pull a consumer ahead of flush time and 
> consume all values from a document's field though a consumer API. An 
> alternative to consuming the entire document would be extending 
> FieldsConsumer to return a StoredValueConsumer / DocValuesConsumer like it is 
> done in DocValues - Branch right now side by side to the TermsConsumer. Yet, 
> extending this has proven to be very tricky and error prone for several 
> reasons:
> * FieldsConsumer requires SegmentWriteState which might be different upon 
> flush compared to when the document is consumed. SegmentWriteState must 
> therefor be created twice 1. when the first docvalues field is indexed 2. 
> when flushed. 
> * FieldsConsumer are current pulled for each indexed field no matter if there 
> are terms to be indexed or not. Yet, if we use something like DocValuesCodec 
> which essentially wraps another codec and creates FieldConsumer on demand the 
> wrapped codecs consumer might not be initialized even if the field is 
> indexed. This causes problems once such a field is opened but missing the 
> required files for that codec. I added some harsh logic to work around this 
> which should be prevented.
> * SegmentCodecs are created for each SegmentWriteState which might yield 
> wrong codec IDs depending on how fields numbers are assigned. We currently 
> depend on the fact that all fields for a segment and therefore their codecs 
> are known when SegmentCodecs are build. To enable consuming perDoc values in 
> codecs we need to do that incrementally
> Codecs should instead provide a DocumentConsumer side by side with the 
> FieldsConsumer created prior to flush. This is also a prerequisite for 
> LUCENE-2621

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to