There is a JIRA task [1] that Jörn wrote a few years ago that calls for
allowing the tokenizers to output new line tokens and there is a PR [2] for
it.

The PR does not change the interfaces and just adds a keepNewLines boolean
to the tokenizers. It doesn't look like this change would affect any
existing applications using OpenNLP. I have built and tested the branch.

I'd appreciate another set of approval eyes on this one to see if we can
merge it and close the task.

[1] https://issues.apache.org/jira/browse/OPENNLP-1185
[2] https://github.com/apache/opennlp/pull/337

Thanks,
Jeff

Reply via email to