[
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008777#comment-13008777
]
Michael McCandless commented on LUCENE-2881:
--------------------------------------------
{quote}
bq. Why do we default SegmentInfos.format now...? Seems spooky?
this hasn't been used in SIS before so I think it didn't matter before.
Yet, I check the format in files() so if you create the SIS without reading it
its set to 0. I can certainly make that work with default to 0 but it seemed
just natural to have it assigned the current_format. I think its fine....
{quote}
Ahh, I see: it's for the case where we make a new SIS() in RAM, because we'll
now look @ the format in files(). OK this sounds right then.
{quote}
bq. Should we backport this to 3.x (after sufficient aging)?
I think we should let it bake in first though. Maybe we can also factor out the
hasVectors in another issues and then backport both once they have been
random-tested for a little while.
{quote}
Definitely let it bake!
Also, I have lots of pending backports to 3.2... which this patch likely
overlaps on, so we should try to do them "in order" to reduce conflicts I think.
> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
> Key: LUCENE-2881
> URL: https://issues.apache.org/jira/browse/LUCENE-2881
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: Realtime Branch, CSF branch, 4.0
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Fix For: Realtime Branch, CSF branch, 4.0
>
> Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch,
> lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch,
> lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global
> field-naming / ordering. IW carries FI instances over from previous segments
> which also carries over field properties like isIndexed etc. While having
> consistent field ordering per IW session appears to be important due to bulk
> merging stored fields etc. carrying over other properties might become
> problematic with Lucene's Codec support. Codecs that rely on consistent
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment
> and field (using the field id within the file name). Yet, if a segment has no
> DocValues indexed in a particular segment but a previous segment in the same
> IW session had DocValues, FieldInfo#docValues will be true since those
> values are reused from previous segments.
> We already work around this "limitation" in SegmentInfo with properties like
> hasVectors or hasProx which is really something we should manage per Codec &
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that
> its properties are valid per segment. It also seems to be necessary to bind
> FieldInfoS to SegmentInfo logically since its really just per segment
> metadata.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]