[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001372#comment-13001372 ]
Michael McCandless commented on LUCENE-2881: -------------------------------------------- bq. I had the same thought while adding the ref to FieldInfos to SegmentInfo. Actually this is probably the right thing to do. At the same time we could switch to a human-readable format Human-readable format would be sweet :) Though I'm still generally nervous that this means just opening the segments file will become quite a bit more costly. Apps that have many fields will be especially penalized, though, apps really should not be creating so many fields. bq. We could also store the global map on disk? That's an interesting idea? That'd ensure stability on the bindings, even for pre-4.0 indices. This way a pre-4.0 index would gradually work itself towards being fully consistent... bq. addIndexes() would have to ignore the global map from the external index(es). Well, addIndexes(IR[]) would get fully remapped to the correct bindings? (since it's a real merge). But, yes, addIndexes(Dir[]) would not -- they are just file-copied. Hmm, they'd also presumably have a different global map, so if we stored the index global map in the Directory, how would we resolve conflicts on the incoming addIndexes...? I guess the local mapping for the incoming segments would override the global one, on conflict. > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Michael Busch > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: LUCENE-2881.patch, lucene-2881.patch, lucene-2881.patch, > lucene-2881.patch, lucene-2881.patch, lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org