Re: sentence detector newline behavior

Jörn Kottmann Sat, 25 Jan 2014 09:05:09 -0800

On 01/25/2014 01:33 PM, Miller, Timothy wrote:

Thanks Joern,
I'll try it. My understanding is I just need to give it my training
data, with the special character I used replaced with the literal string
"<LF>" and each line in the file is an example sentence.


Yes, exactly.

Just thinking about the cTAKES wrapper -- do your changes make it so
that we wouldn't need to add the special characters (<LF>,<CR>) to a
document within the cTAKES sentence detector wrapper?


Right, the sentence detector expects the chars as input, not the tags.

For example:
"This is a sentence terminated by a new line\nAnd this is on more sentence."

It sounds like we
would need to add <CR> and <LF> to our eosChars value, it's early (for
my brain) but I wonder whether that could be a default on the opennlp end?

If you pass them in during the training they are stored in the modelpackage. All you need to

do is to instantiate the Sentence Detector and it should be ready to use.

BTW, there is also an UIMA integration in opennlp-uima, maybe that couldwork quite well for ctakes.


Jörn

Re: sentence detector newline behavior

Reply via email to