NishantShri4 commented on PR #792:
URL: https://github.com/apache/opennlp/pull/792#issuecomment-2973009730

   Dear Reviewers,
   
   This PR is opened to clarify a few things around the usage of 'useTokenEnd' 
flag in SentenceDetector.
   
   **1.** We have below issue prioritized for release 2.6.0. 
        
        _https://issues.apache.org/jira/browse/OPENNLP-205
         (Refactor the SentenceDetectorME class to do the mapping of 
end-of-sent  positions to spans better)_
   
       Above issue  says that the code fails in some scenarios when useTokenEnd 
is false. 
       However, I see that a fix was already made previously for usage of this 
flag in  
       https://issues.apache.org/jira/browse/OPENNLP-711.
       I have added a simple test, which demonstrates the use of useTokenEnd 
flag.
   
      **Question** : Could someone pls. provide some clarification on the 
changes required to fix OPENNLP-205. 
   
   **2.** The Sentence Detector documentation says that for training :
       _" The data must be converted to the OpenNLP Sentence Detector training 
format. Which is one sentence per line. "_
   
       However, in the test data sample for German text - 
       
https://github.com/apache/opennlp/blob/main/opennlp-tools/src/test/resources/opennlp/tools/sentdetect/Sentences_DE.txt
       
      We see examples of two sentences in one line. E.g.
   
   `   Ein älterer Herr gesellt sich zu ihm und schimpft über den König von 
Italien. Am Ende der Anhöhe geht er dann viel leichter.`
   
   **3.** Can we add some documentation in the manual for this flag?
   
   Best Regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to