On 01/25/2014 01:33 PM, Miller, Timothy wrote:
Thanks Joern,
I'll try it. My understanding is I just need to give it my training
data, with the special character I used replaced with the literal string
"<LF>" and each line in the file is an example sentence.

Yes, exactly.

Just thinking about the cTAKES wrapper -- do your changes make it so
that we wouldn't need to add the special characters (<LF>,<CR>) to a
document within the cTAKES sentence detector wrapper?

Right, the sentence detector expects the chars as input, not the tags.

For example:
"This is a sentence terminated by a new line\nAnd this is on more sentence."


It sounds like we
would need to add <CR> and <LF> to our eosChars value, it's early (for
my brain) but I wonder whether that could be a default on the opennlp end?

If you pass them in during the training they are stored in the model package. All you need to
do is to instantiate the Sentence Detector and it should be ready to use.

BTW, there is also an UIMA integration in opennlp-uima, maybe that could work quite well for ctakes.

Jörn


Reply via email to