Martin Wiesner created OPENNLP-1767:
---------------------------------------
Summary: Fix sentence detection when an abbreviation overlaps at
sentence end
Key: OPENNLP-1767
URL: https://issues.apache.org/jira/browse/OPENNLP-1767
Project: OpenNLP
Issue Type: Bug
Components: Sentence Detector
Affects Versions: 2.5.5
Reporter: Martin Wiesner
Assignee: Martin Wiesner
Fix For: 2.5.6, 3.0.0
Atm, sentence detection works incorrectly in case an abbreviation dictionary is
loaded which contains common abbreviations, that is, if an abbreviation such as
"S." (page in German) overlaps at the sentence end, the actual sentence end is
not respected and the subsequent sentence is glued to the previous one.
Consequently, the actual sentence boundary is not respected and causes a
mismatch.
Examples for the German language:
- "Die Frage wurde gestellt. Sie wurde beantwortet."
- "Es lag am DBMS. Die Performance muss verbessert werden."
A reproducer can easily be constructed via a JUnit test for
{{SentenceDetectorMEGermanTest}}.
Note:
Affects all other languages as well. Therefore, the implications are of a
higher priority than usual.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)