Hi Alan, Do you have a PR for the implementation?
Thank you, William Em ter., 19 de jan. de 2021 às 23:52, Alan Wang <wp_s...@163.com> escreveu: > Hi all, > > I created a rule based sentence detector for OpenNLP > <https://issues.apache.org/jira/browse/OPENNLP-912>. > There are two kinds of rules: > > 1. break rules: specifying the sentence break > 2. no-break rules: disallowing the sentence break > > All rules have two parts: > > Before the break > After the break > > The algorithm idea: > > Retrieves the break rules. > If none of the no-break rules is matched at the break location, the text > is marked as split and a new segment is created > > Features: > > Text Cleanup and Preprocessing > Easy to extend other languages > > Reference: > > This library use "Golden Rule" test of pragmatic_segmenter > <https://github.com/diasks2/pragmatic_segmenter#the-golden-rules> > > Currently, the pass rate of test cases is 92.31%. The following test cases > fail: 39, 50, 53, 52 > For details, see the attachment. > > ------------------------------ > > > > > >