NishantShri4 commented on PR #792: URL: https://github.com/apache/opennlp/pull/792#issuecomment-2973009730
Dear Reviewers, This PR is opened to clarify a few things around the usage of 'useTokenEnd' flag in SentenceDetector. **1.** We have below issue prioritized for release 2.6.0. _https://issues.apache.org/jira/browse/OPENNLP-205 (Refactor the SentenceDetectorME class to do the mapping of end-of-sent positions to spans better)_ Above issue says that the code fails in some scenarios when useTokenEnd is false. However, I see that a fix was already made previously for usage of this flag in https://issues.apache.org/jira/browse/OPENNLP-711. I have added a simple test, which demonstrates the use of useTokenEnd flag. **Question** : Could someone pls. provide some clarification on the changes required to fix OPENNLP-205. **2.** The Sentence Detector documentation says that for training : _" The data must be converted to the OpenNLP Sentence Detector training format. Which is one sentence per line. "_ However, in the test data sample for German text - https://github.com/apache/opennlp/blob/main/opennlp-tools/src/test/resources/opennlp/tools/sentdetect/Sentences_DE.txt We see examples of two sentences in one line. E.g. ` Ein älterer Herr gesellt sich zu ihm und schimpft über den König von Italien. Am Ende der Anhöhe geht er dann viel leichter.` **3.** Can we add some documentation in the manual for this flag? Best Regards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@opennlp.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org