OPENNLP-1185: Tokenizers should be able to output a new line token

Jeff Zemerick Tue, 29 Mar 2022 07:17:40 -0700

There is a JIRA task [1] that Jörn wrote a few years ago that calls for
allowing the tokenizers to output new line tokens and there is a PR [2] for
it.


The PR does not change the interfaces and just adds a keepNewLines boolean
to the tokenizers. It doesn't look like this change would affect any
existing applications using OpenNLP. I have built and tested the branch.

I'd appreciate another set of approval eyes on this one to see if we can
merge it and close the task.

[1] https://issues.apache.org/jira/browse/OPENNLP-1185
[2] https://github.com/apache/opennlp/pull/337

Thanks,
Jeff

OPENNLP-1185: Tokenizers should be able to output a new line token

Reply via email to