Hi,
I’d like to propose releasing OpenNLP 2.5.6.1 to address a regression in the 
Sentence Detector introduced in 2.5.6.
When an abbreviation appears at the beginning of a sentence, SentenceDetectorME 
in OpenNLP 2.5.6 throws a java.lang.StringIndexOutOfBoundsException.
I've fixed that issue on main / opennlp-2.x
A practical case where this occurs is when using ICD-10 codes or other 
abbreviations at the start of a sentence (e.g. in a medical text). 
This currently breaks sentence detection for affected users if an abbreviation 
dictionary is used.
I would therefore propose to release a 2.5.6.1 patch version containing the fix 
for this issue (already addressed with the overlap handling improvement).
The change is small, localized, and it is IMHO important to restore sentence 
detection with abbreviation support for users.
Unless there are objections, I would like to prepare a RC soon so users  can 
upgrade safely without waiting for 2.5.7.
WDYT?
Gruß
Richard

Reply via email to