[ 
https://issues.apache.org/jira/browse/OPENNLP-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner updated OPENNLP-1809:
------------------------------------
        Fix Version/s: 2.5.8
    Affects Version/s: 2.5.7
                           (was: 2.5.8)

> SentenceDetector misses multi-letter abbreviations at sentence start
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1809
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1809
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Sentence Detector
>    Affects Versions: 2.5.7, 3.0.0-M1
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.5.8, 3.0.0-M2
>
>
> As a follow-up of OPENNLP-1781, a deeper inspection with real world data 
> revealed that _SentenceDetectorME_ does not work as expected when a sentence 
> starts with a multi-letter abbreviation.
> Example text:
> "Bek. Problem: Schlafmangel. Über die letzten Tage hinweg war sie zunehmend 
> müde."
> Expected: 2 sentences: 
>  * "Bek. Problem: Schlafmangel."
>  * "Über die letzten Tage hinweg war sie zunehmend müde."
> However, 3 sentences are returned, even "Bek." (Bekanntes -> "known") is 
> added to the abbreviation xml file for the German language.
> Goal:
> The fix shall resolve this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to