krickert commented on PR #1003:
URL: https://github.com/apache/opennlp/pull/1003#issuecomment-4267147245
@mawiesne @rzo1 — pushed two review-follow-up commits, branch is now at
`26ef5152`:
1. **`OPENNLP-1816: Drop strong Thread reference in
LastResultOwnerOrThreadLocal`** (`7e442322`)
- Track ownership via `Thread#threadId()` (long) instead of a `volatile
Thread` reference, matching what `OwnerOrPerThreadState` already does. Holding
a strong `Thread` reference in a long-lived component would pin the worker
thread's context classloader in container environments — exactly the leak this
class was designed to avoid. Worst-case under id-based tracking is that a
recycled-id thread sees a stale `ownerValue` instead of `null`, which is no
worse than the documented `get()` contract (callers `set()` before `get()` on
every thread).
- Added focused unit tests for both `LastResultOwnerOrThreadLocal` and
`OwnerOrPerThreadState`: owner-fast-path, second-thread isolation, owner
reclaim after `clearForCurrentThread()`, one-way multi-threaded transition,
16-thread concurrent stress, scoped clear semantics. 12 new tests, all passing.
Previously these helpers were only exercised indirectly through
`ThreadSafetyBenchmarkIT`.
- Class Javadoc on `LastResultOwnerOrThreadLocal` now documents the three
thread-safety strategies side-by-side (owner-fast-path / volatile-swap / plain
`ThreadLocal`) and which ME classes use which, so the reasoning isn't spread
across seven files.
2. **`OPENNLP-1816: BeamSearch tempScores reuse and thread-safety Javadoc
polish`** (`26ef5152`)
- `BeamSearch.CacheState` now stashes a per-thread `tempScores` buffer of
length `numOutcomes` instead of allocating a fresh `double[]` inside the inner
beam loop on every iteration. Functionally equivalent; removes one allocation
per beam step per token on the cache hit/miss hot path.
- `BeamSearch.close()` Javadoc spells out the per-thread cleanup
contract: a single `close()` only releases the calling thread's `CacheState`,
not every per-thread slot held by long-lived `BeamSearch` instances shared
across pool threads.
- `POSTaggerME` class Javadoc clarifies that sharing one tagger saves
both memory **and model load time** (the dominant startup cost).
- `CachedFeatureGenerator` class Javadoc calls out that the "is this
still the same sentence?" check uses reference identity (`tokens ==
prevTokens`) — passing a freshly allocated `String[]` with the same contents
triggers a cache miss + clear.
Full local `verify` is green: **685 / 685 unit tests** + **19 / 19
integration tests** (including all 8 `ThreadSafetyBenchmarkIT` cases),
checkstyle clean.
@rzo1 — would you mind re-triggering
[eval-tests-configurable](https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/)
against the new tip `26ef5152`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]