NishantShri4 commented on PR #792:
URL: https://github.com/apache/opennlp/pull/792#issuecomment-2973009730
Dear Reviewers,
This PR is opened to clarify a few things around the usage of 'useTokenEnd'
flag in SentenceDetector.
**1.** We have below issue prioritized for release 2.6.0.
_https://issues.apache.org/jira/browse/OPENNLP-205
(Refactor the SentenceDetectorME class to do the mapping of
end-of-sent positions to spans better)_
Above issue says that the code fails in some scenarios when useTokenEnd
is false.
However, I see that a fix was already made previously for usage of this
flag in
https://issues.apache.org/jira/browse/OPENNLP-711.
I have added a simple test, which demonstrates the use of useTokenEnd
flag.
**Question** : Could someone pls. provide some clarification on the
changes required to fix OPENNLP-205.
**2.** The Sentence Detector documentation says that for training :
_" The data must be converted to the OpenNLP Sentence Detector training
format. Which is one sentence per line. "_
However, in the test data sample for German text -
https://github.com/apache/opennlp/blob/main/opennlp-tools/src/test/resources/opennlp/tools/sentdetect/Sentences_DE.txt
We see examples of two sentences in one line. E.g.
` Ein älterer Herr gesellt sich zu ihm und schimpft über den König von
Italien. Am Ende der Anhöhe geht er dann viel leichter.`
**3.** Can we add some documentation in the manual for this flag?
Best Regards.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]