Hi Alan,

Do you have a PR for the implementation?

Thank you,
William

Em ter., 19 de jan. de 2021 às 23:52, Alan Wang <wp_s...@163.com> escreveu:

> Hi all,
>
> I created a rule based sentence detector for OpenNLP
> <https://issues.apache.org/jira/browse/OPENNLP-912>.
> There are two kinds of rules:
>
> 1. break rules: specifying the sentence break
> 2. no-break rules: disallowing the sentence break
>
> All rules have two parts:
>
> Before the break
> After the break
>
> The algorithm idea:
>
> Retrieves the break rules.
> If none of the no-break rules is matched at the break location, the text
> is marked as split and a new segment is created
>
> Features:
>
> Text Cleanup and Preprocessing
> Easy to extend other languages
>
> Reference:
>
> This library use "Golden Rule" test of pragmatic_segmenter
> <https://github.com/diasks2/pragmatic_segmenter#the-golden-rules>
>
> Currently, the pass rate of test cases is 92.31%. The following test cases
> fail: 39, 50, 53, 52
> For details, see the attachment.
>
> ------------------------------
>
>
>
>
>
>

Reply via email to