krickert commented on PR #1003:
URL: https://github.com/apache/opennlp/pull/1003#issuecomment-4268032693

   @mawiesne @rzo1 — thanks for the review, all three concerns addressed in 
three follow-up commits, branch tip is now `19a2ac68`.
   
   ### 1. `POSTaggerNameFeatureGenerator` race (P0) — `cb4d7f2e`
   
   Confirmed reproducible. Fix:
   - Per-sentence cache (`cachedTokens` / `cachedTags`) is now held in a 
per-thread `CacheState` via `ThreadLocal`, so each thread indexes into a tag 
array that always belongs to the sentence it just tagged. No more 
length-mismatched interleavings between threads.
   - Class is now `@ThreadSafe`; the only behavioral change vs the original is 
per-thread isolation. The existing `Arrays.equals(cachedTokens, toks)` 
cache-hit semantics are preserved.
   - New unit test `testConcurrentCreateFeaturesIsThreadSafe` reproduces the 
original failure shape: 16 threads × 200 reps × sentences of differing lengths 
on one shared generator. Verified that test fails on the unfixed class (with 
exactly the `AIOOBE` you described) and passes after the fix.
   
   ### 2. `DictionaryFeatureGenerator` — `c6274c46`
   
   Right — in normal use it is set once at construction and never replaced, so 
it's not a runtime race. But the field needed safe publication for both the 
constructor write and any later `setDictionary()` write. So:
   - `isg` is now `volatile`; class annotated `@ThreadSafe` (the underlying 
`InSpanGenerator` is already `@ThreadSafe`).
   - `createFeatures()` reads `isg` into a local before delegating, so an 
in-flight call finishes against whichever dictionary it observed first.
   - Javadoc now explicitly says `setDictionary()` is for setup time, not the 
hot path: it does not coordinate with in-flight reads beyond the volatile 
publish, so callers swapping mid-flight may observe either dictionary.
   
   ### 3. Nested `ThreadLocal` in `NameFinderME` — `19a2ac68`
   
   Good catch. Two changes here:
   - `AdditionalContextFeatureGenerator` is now one shared instance per 
`NameFinderME` (was: one freshly-allocated AFG per thread inside 
`NameFinderState`). The AFG's existing internal `ThreadLocal<String[][]>` 
handles per-thread context just as before, with no nesting and no per-thread 
allocation of the wrapper.
   - The `bestSequence` slot is now a `LastResultOwnerOrThreadLocal<Sequence>` 
instead of `ThreadLocal<NameFinderState>`, matching the pattern `POSTaggerME` / 
`SentenceDetectorME` / `TokenizerME` already use. Single-threaded short-lived 
`NameFinderME` instances now get the owner-fast-path (no `ThreadLocal` map 
entry until a second thread shows up).
   - Added `AdditionalContextFeatureGenerator.clearForCurrentThread()` so 
`NameFinderME.clearThreadLocalState()` releases the AFG's per-thread slot too.
   - Rewrote `clearThreadLocalState()` Javadoc to spell out the per-thread (not 
per-instance) contract, same as on the other ME classes.
   
   ### Verification
   
   686/686 unit tests pass on `opennlp-core/opennlp-runtime` (the +1 vs the 
previous 685 is the new `POSTaggerNameFeatureGenerator` concurrency test). I 
also bisected the new test against the unfixed class to confirm it actually 
catches the described race.
   
   @rzo1 — would you mind re-triggering 
[eval-tests-configurable](https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/)
 against `19a2ac68`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to