rzo1 commented on PR #1105:
URL: https://github.com/apache/opennlp/pull/1105#issuecomment-4799034648
This PR carries the only backward-compat impact in the stack, all
intentional but worth an explicit release note / migration entry:
- Removal of public `NameFinderDL.I_PER` / `B_PER` (compile-time-inlined
constants - source and binary break for external referrers).
- `find()` returns normalized-text offsets under a length-changing dash
fold; `findInOriginal()` provides original coordinates.
- `DocumentCategorizerDL.categorize()` now throws on empty/tokenless input.
- DL chunking moved from `split("\\s+")` to the full Unicode `White_Space`
set, which affects all DL callers (not just opt-in users) and can shift chunk
boundaries on non-ASCII whitespace.
Minor:
- New exception messages are lowercase, while the surrounding thrown
messages in both files are capitalized.
- `mergeOverlappingSpans` returns the input list by reference when `size <
2` rather than a copy.
- The `distribution.length != categories.size()` guard and the `infer()`
null-output path aren't unit-tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]