krickert commented on PR #1003: URL: https://github.com/apache/opennlp/pull/1003#issuecomment-4268032693
@mawiesne @rzo1 — thanks for the review, all three concerns addressed in three follow-up commits, branch tip is now `19a2ac68`. ### 1. `POSTaggerNameFeatureGenerator` race (P0) — `cb4d7f2e` Confirmed reproducible. Fix: - Per-sentence cache (`cachedTokens` / `cachedTags`) is now held in a per-thread `CacheState` via `ThreadLocal`, so each thread indexes into a tag array that always belongs to the sentence it just tagged. No more length-mismatched interleavings between threads. - Class is now `@ThreadSafe`; the only behavioral change vs the original is per-thread isolation. The existing `Arrays.equals(cachedTokens, toks)` cache-hit semantics are preserved. - New unit test `testConcurrentCreateFeaturesIsThreadSafe` reproduces the original failure shape: 16 threads × 200 reps × sentences of differing lengths on one shared generator. Verified that test fails on the unfixed class (with exactly the `AIOOBE` you described) and passes after the fix. ### 2. `DictionaryFeatureGenerator` — `c6274c46` Right — in normal use it is set once at construction and never replaced, so it's not a runtime race. But the field needed safe publication for both the constructor write and any later `setDictionary()` write. So: - `isg` is now `volatile`; class annotated `@ThreadSafe` (the underlying `InSpanGenerator` is already `@ThreadSafe`). - `createFeatures()` reads `isg` into a local before delegating, so an in-flight call finishes against whichever dictionary it observed first. - Javadoc now explicitly says `setDictionary()` is for setup time, not the hot path: it does not coordinate with in-flight reads beyond the volatile publish, so callers swapping mid-flight may observe either dictionary. ### 3. Nested `ThreadLocal` in `NameFinderME` — `19a2ac68` Good catch. Two changes here: - `AdditionalContextFeatureGenerator` is now one shared instance per `NameFinderME` (was: one freshly-allocated AFG per thread inside `NameFinderState`). The AFG's existing internal `ThreadLocal<String[][]>` handles per-thread context just as before, with no nesting and no per-thread allocation of the wrapper. - The `bestSequence` slot is now a `LastResultOwnerOrThreadLocal<Sequence>` instead of `ThreadLocal<NameFinderState>`, matching the pattern `POSTaggerME` / `SentenceDetectorME` / `TokenizerME` already use. Single-threaded short-lived `NameFinderME` instances now get the owner-fast-path (no `ThreadLocal` map entry until a second thread shows up). - Added `AdditionalContextFeatureGenerator.clearForCurrentThread()` so `NameFinderME.clearThreadLocalState()` releases the AFG's per-thread slot too. - Rewrote `clearThreadLocalState()` Javadoc to spell out the per-thread (not per-instance) contract, same as on the other ME classes. ### Verification 686/686 unit tests pass on `opennlp-core/opennlp-runtime` (the +1 vs the previous 685 is the new `POSTaggerNameFeatureGenerator` concurrency test). I also bisected the new test against the unfixed class to confirm it actually catches the described race. @rzo1 — would you mind re-triggering [eval-tests-configurable](https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/) against `19a2ac68`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
