neoremind opened a new pull request, #16171: URL: https://github.com/apache/lucene/pull/16171
#15886 is a nice optimization, when a field keeps using the same frozen FieldType instance inter-documents, we can skip the whole `schema.reset() + updateDocFieldSchema() + assertSameSchema()` validation path. I found a corner case, the validatedFrozenFieldType cache never recovers after switching between two different frozen FieldType instances that share the same schema. Once we switch from fieldTypeA to fieldTypeB (same schema, different instance), the validatedFrozenFieldType cache gets nulled and stays null for the rest of the segment, then every subsequent document still pays the cost of `schema.reset() + updateDocFieldSchema() + assertSameSchema()` as before. I verified this by adding an `InfoStream` log in the assertion schema slow path and counting how many times invoked, see [test link](https://github.com/neoremind/lucene/commit/4ab6832fe8ecd8facbaa261ad1c14636334b8da9#diff-e4a05dae452b3cb44726589d1cec355df5c43f02d2f4c63c8a7699a68d9e6427R24-R315 ), I don't want to introduce the heavy `InfoStream` just for this simple logic here. The existing test `testFrozenFieldTypeCacheRestabilizes` reproduces the problem, but since it doesn't assert how many times we hit the slow path, it still passes. To illustrate: ``` FieldType ftA = new FieldType(TextField.TYPE_STORED); ftA.freeze(); // insert Doc 0 - 9 with ftA FieldType ftB = new FieldType(TextField.TYPE_STORED); // same schema ftB.freeze(); // insert Doc 10 - 20 with ftB // Doc 0: INIT path → cache = ftA // Docs 1–9: ftA == cache → fast path, no duplication on schema.reset() + updateDocFieldSchema() + assertSameSchema() // Doc 10: ftB != ftA → cache = null, validate path, assertSameSchema passes // BUT nobody re-promotes ftB → cache stays null // Docs 11–20: cache is null → slow path assert schema for all subsequent calls ``` In detail, two issues: 1. In `PerField.reset()`, `candidateFieldType` is only set when `fieldInfo == null`, which is the very first document for that field in the segment. After that first doc, no matter what frozen type comes in, it's never recorded as a candidate. So there's simply nothing available to promote into the cache later. 2. `trySetValidatedFrozenFieldType()`, the method that actually promotes a candidate into `validatedFrozenFieldType` cache only runs inside the `fieldInfo == null` branch of `initAndValidateFields()`. It's never called in the else-branch where we already have a `fieldInfo` and just run `assertSameSchema()`. Also adds test coverage for `multiValueForcesDeoptimize`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
