Hi, On 02/08/2012 05:52 PM, Katrin Tomanek wrote:
[...] I realized that only these EOS (end of sentence) characters are currently supported:'.', '!', '?' However, in our case we have many other EOS (":" as one of the most common ones)
I believe our situation is even worse, because we want to have line breaks as possible EOS. We use OpenNLP through UIMA where this should not be an issue, but I understand that the algorithms are designed to work with training files that use line breaks to represent sentence boundaries, i.e. line breaks are used as a meta character that can not actually occur within the document.
When introducing configurability of EOS characters it would be good to take that into account and provide a way to deal with line breaks in the documents.
Jens
