[
https://issues.apache.org/jira/browse/OPENNLP-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner updated OPENNLP-1767:
------------------------------------
Priority: Major (was: Critical)
> Fix sentence detection when an abbreviation overlaps at sentence end
> --------------------------------------------------------------------
>
> Key: OPENNLP-1767
> URL: https://issues.apache.org/jira/browse/OPENNLP-1767
> Project: OpenNLP
> Issue Type: Bug
> Components: Sentence Detector
> Affects Versions: 2.5.5
> Reporter: Martin Wiesner
> Assignee: Martin Wiesner
> Priority: Major
> Fix For: 2.5.6, 3.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Atm, sentence detection works incorrectly in case an abbreviation dictionary
> is loaded which contains common abbreviations, that is, if an abbreviation
> such as "S." (page in German) overlaps at the sentence end, the actual
> sentence end is not respected and the subsequent sentence is glued to the
> previous one. Consequently, the actual sentence boundary is not respected and
> causes a mismatch.
> Examples for the German language:
> - "Die Frage wurde gestellt. Sie wurde beantwortet."
> - "Es lag am DBMS. Die Performance muss verbessert werden."
> A reproducer can easily be constructed via a JUnit test for
> {{SentenceDetectorMEGermanTest}}.
> Note:
> Affects all other languages as well. Therefore, the implications are of a
> higher priority than usual.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)