The sentence detector always ends a sentence where there are newlines. This is a problem for some notes (e.g. MIMIC radiology notes) where a line can wrap in the middle of a sentence at specified character offsets. In the comments for SentenceDetector, it seems to be split up very logically in that it first runs the opennlp sentence detector, then breaks any detected sentence wherever there is a newline. Questions: 1) Would it be good to add a boolean parameter for breaking on newlines? 2) If that section was removed/avoided, does the opennlp sentence detector give good results given our model? Or is the model trained on text that always breaks at carriage returns?
Tim