Hi,
I’d like to propose releasing OpenNLP 2.5.6.1 to address a regression in the
Sentence Detector introduced in 2.5.6.
When an abbreviation appears at the beginning of a sentence, SentenceDetectorME
in OpenNLP 2.5.6 throws a java.lang.StringIndexOutOfBoundsException.
I've fixed that issue on main / opennlp-2.x
A practical case where this occurs is when using ICD-10 codes or other
abbreviations at the start of a sentence (e.g. in a medical text).
This currently breaks sentence detection for affected users if an abbreviation
dictionary is used.
I would therefore propose to release a 2.5.6.1 patch version containing the fix
for this issue (already addressed with the overlap handling improvement).
The change is small, localized, and it is IMHO important to restore sentence
detection with abbreviation support for users.
Unless there are objections, I would like to prepare a RC soon so users can
upgrade safely without waiting for 2.5.7.
WDYT?
Gruß
Richard