krickert commented on PR #1084: URL: https://github.com/apache/opennlp/pull/1084#issuecomment-4702648323
These are all great points. It's never easy to make things thread safe :) So thank you for your patience. I just updated to addresses items 1 & 2 and the API/close notes. ### 1. Chunking consistency (`DocumentCategorizerDL` / `NameFinderDL`) Took your preferred path: extracted shared chunking into `AbstractDL.chunkRanges(...)` and wired **both** `tokenize()` implementations through it. For `len=10, split=8, overlap=4`, both now produce `[0,8)` and `[4,10)` — no redundant trailing `[8,10)` chunk. `NameFinderDL` is updated to match the corrected tail behavior rather than leaving the siblings on different loops. Added `ChunkRangesTest` with explicit coverage for that case (plus a few adjacent scenarios). ### 2. Comment drift Updated both `tokenize()` comments to describe overlapping chunks driven by `InferenceOptions` (`documentSplitSize` / `splitOverlapSize`), and removed the stale hardcoded "200 word / 50 overlapping" wording. ### 3. API surface / `static` Reverted `loadVocab(...)` and `createTokenizer(...)` to **instance** methods (`public` / `protected`), so there’s no `public static` break for external callers. The actual implementation lives in package-private static helpers (`loadVocabFile`, `createBertTokenizer`, `createWordpieceTokenizer`) used internally and by in-module tests. On the implicit no-arg `AbstractDL` constructor removal: unchanged in this update — in-repo subclasses/tests already use the required-args constructor. Happy to call that out in release notes if you think it’s worth documenting for downstream extenders. ### Minor: terminal `close()` `close()` now treats the first attempt as terminal: `closed` is flipped via `compareAndSet` **before** `session.close()`, and we no longer reset it on failure. A failed native close won’t be retried into an “already closed” state. ### Verification ``` ./mvnw -pl opennlp-core/opennlp-ml/opennlp-dl test # Tests run: 41, Failures: 0, Errors: 0, Skipped: 0 — BUILD SUCCESS ``` Let me know if you’d like anything else adjusted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
