The sentence detector always ends a sentence where there are newlines.
This is a problem for some notes (e.g. MIMIC radiology notes) where a
line can wrap in the  middle of a sentence at specified character
offsets. In the comments for SentenceDetector, it seems to be split up
very logically in that it first runs the opennlp sentence detector, then
breaks any detected sentence wherever there is a newline. Questions:
1) Would it be good to add a boolean parameter for breaking on newlines?
2) If that section was removed/avoided, does the opennlp sentence
detector give good results given our model? Or is the model trained on
text that always breaks at carriage returns?

Tim

Reply via email to