neoremind opened a new pull request, #16171:
URL: https://github.com/apache/lucene/pull/16171

   #15886 is a nice optimization, when a field keeps using the same frozen 
FieldType instance inter-documents, we can skip the whole `schema.reset() + 
updateDocFieldSchema() + assertSameSchema()` validation path.
   
   I found a corner case, the validatedFrozenFieldType cache never recovers 
after switching between two different frozen FieldType instances that share the 
same schema.
   
   Once we switch from fieldTypeA to fieldTypeB (same schema, different 
instance), the validatedFrozenFieldType cache gets nulled and stays null for 
the rest of the segment, then every subsequent document still pays the cost of 
`schema.reset() + updateDocFieldSchema() + assertSameSchema()` as before.
   
   I verified this by adding an `InfoStream` log in the assertion schema slow 
path and counting how many times invoked, see [test 
link](https://github.com/neoremind/lucene/commit/4ab6832fe8ecd8facbaa261ad1c14636334b8da9#diff-e4a05dae452b3cb44726589d1cec355df5c43f02d2f4c63c8a7699a68d9e6427R24-R315
   ), I don't want to introduce the heavy `InfoStream` just for this simple 
logic here. The existing test `testFrozenFieldTypeCacheRestabilizes` reproduces 
the problem, but since it doesn't assert how many times we hit the slow path, 
it still passes.
   
   To illustrate:
   
   ```
   FieldType ftA = new FieldType(TextField.TYPE_STORED);
   ftA.freeze();
   // insert Doc 0 - 9 with ftA
   FieldType ftB = new FieldType(TextField.TYPE_STORED); // same schema
   ftB.freeze();
   // insert Doc 10 - 20 with ftB
   
   // Doc 0: INIT path → cache = ftA
   // Docs 1–9: ftA == cache → fast path, no duplication on schema.reset() + 
updateDocFieldSchema() + assertSameSchema()
   // Doc 10: ftB != ftA → cache = null, validate path, assertSameSchema passes
   //         BUT nobody re-promotes ftB → cache stays null
   // Docs 11–20: cache is null → slow path assert schema for all subsequent 
calls 
   ```
   
   In detail, two issues:
   
   1. In `PerField.reset()`, `candidateFieldType` is only set when `fieldInfo 
== null`, which is the very first document for that field in the segment. After 
that first doc, no matter what frozen type comes in, it's never recorded as a 
candidate. So there's simply nothing available to promote into the cache later.
   
   2. `trySetValidatedFrozenFieldType()`, the method that actually promotes a 
candidate into `validatedFrozenFieldType` cache only runs inside the `fieldInfo 
== null` branch of `initAndValidateFields()`. It's never called in the 
else-branch where we already have a `fieldInfo` and just run 
`assertSameSchema()`.
   
   Also adds test coverage for `multiValueForcesDeoptimize`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to