[ 
https://issues.apache.org/jira/browse/OPENNLP-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner updated OPENNLP-1767:
------------------------------------
    Priority: Major  (was: Critical)

> Fix sentence detection when an abbreviation overlaps at sentence end
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1767
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1767
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Sentence Detector
>    Affects Versions: 2.5.5
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.5.6, 3.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Atm, sentence detection works incorrectly in case an abbreviation dictionary 
> is loaded which contains common abbreviations, that is, if an abbreviation 
> such as "S." (page in German) overlaps at the sentence end, the actual 
> sentence end is not respected and the subsequent sentence is glued to the 
> previous one. Consequently, the actual sentence boundary is not respected and 
> causes a mismatch.
> Examples for the German language:
> - "Die Frage wurde gestellt. Sie wurde beantwortet."
> - "Es lag am DBMS. Die Performance muss verbessert werden."
> A reproducer can easily be constructed via a JUnit test for 
> {{SentenceDetectorMEGermanTest}}.
> Note:
> Affects all other languages as well. Therefore, the implications are of a 
> higher priority than usual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to