[ https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010798#comment-13010798 ]
Simon Willnauer commented on LUCENE-2985: ----------------------------------------- bq. I wonder if we should pass the segmentCodecsBuilder to FieldInfos? This way, FieldInfos.add/update could set the codecID, instead of caller doing it after the fact (in DocFieldProcessorPerThread)? here is the thing, I first added it to FieldInfos since it appears to be the place for that kind of stuff. Yet, the first problem is that DocFieldProcessorPerThread is caching the FI for each DFPPerField so I would really need to add it to each FieldInfo (FI not FIs). Further having another invariant in FIs that only applies if we are writing is something I tried to prevent in the first place and eventually SegementCodecs is somewhat internal to the SegmentInfo and not to the FieldInfos and I tried to couple them only by the codec ID though. I agree this would be easier and less disturbing in the code. I'd love to find a better way to do that really.... except of this part in DocFieldProcessorPerThread is smooth though :/ > Build SegmentCodecs incrementally for consistent codecIDs during indexing > ------------------------------------------------------------------------- > > Key: LUCENE-2985 > URL: https://issues.apache.org/jira/browse/LUCENE-2985 > Project: Lucene - Java > Issue Type: Improvement > Components: Codecs, Index > Affects Versions: CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Fix For: CSF branch, 4.0 > > Attachments: LUCENE-2985.patch > > > currently we build the SegementCodecs during flush which is fine as long as > no codec needs to know which fields it should handle. This will change with > DocValues or when we expose StoredFields / TermVectors via Codec (see > LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a > consistent view of which codec belongs to which field during indexing and all > FieldInfo instances are unassigned (set to -1). Instead we should build the > SegmentCodecs incrementally as fields come in so no matter when a codec needs > to be selected to process a document / field we have the right codec ID > assigned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org