There is a JIRA task [1] that Jörn wrote a few years ago that calls for allowing the tokenizers to output new line tokens and there is a PR [2] for it.
The PR does not change the interfaces and just adds a keepNewLines boolean to the tokenizers. It doesn't look like this change would affect any existing applications using OpenNLP. I have built and tested the branch. I'd appreciate another set of approval eyes on this one to see if we can merge it and close the task. [1] https://issues.apache.org/jira/browse/OPENNLP-1185 [2] https://github.com/apache/opennlp/pull/337 Thanks, Jeff