While upgrading I ran afoul of some inconsistencies in our schema usage, and to fix them I've ended up having to add data to our index that I'd rather not. Let me give a little context: We have a parent/child document structure. Some fields are shared across partn and child docs, others are not. Our index has a sort key, and in order for all the parent/child docs to sort together correctly, we add the same (docvalues) fields that are part of the sortkey to both parent and child docs. Some of these fields are *also* indexed as postings (StringField) of the same name, but we only index the postings field on the parent document, since child documents are never searched for on their own - always in conjunction with a parent.
The schema-checking code we added in Lucene 9 does not allow this: it enforces that all documents having a field should have the same "index options", and failing to index the postings gets interpreted as having index options = NONE (because of the presence of the doc values field of the same name, I think?) Our current solution is to also index the postings for the child document (but just with an empty string value). This seems gross, and creates postings in the index that we will never use. Another possibility would be to rename the fields so that the postings and docvalues fields have different names. But in this case our application-level schema diverges from our Lucene schema, adding a layer of complexity we'd rather not introduce. Finally, could we relax this constraint, always allowing index options=NONE regardless of how other docs are indexed? Would it cause problems? -Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
