Martin Wiesner created OPENNLP-1809:
---------------------------------------

             Summary: SentenceDetector misses multi-letter abbreviations at 
sentence start
                 Key: OPENNLP-1809
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1809
             Project: OpenNLP
          Issue Type: Bug
          Components: Sentence Detector
    Affects Versions: 3.0.0-M1, 2.5.8
            Reporter: Martin Wiesner
            Assignee: Martin Wiesner
             Fix For: 3.0.0-M2


As a follow-up of OPENNLP-1781, a deeper inspection with real world data 
revealed that SentenceDetectorME does not work as expected when a sentence 
starts with a multi-letter abbreviation.

Example text:

"Bek. Problem: Schlafmangel. Über die letzten Tage hinweg zunehmend müde."

Expected: 2 sentences: 
 * "Bek. Problem: Schlafmangel."
 * "Über die letzten Tage hinweg zunehmend müde."

However, 3 sentences are returned, even "Bek." (Bekanntes -> "known") is added 
to the abbreviation xml file for the German language.

Goal:
The fix shall resolve this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to