+1 There's an example of the configs here :) https://issues.apache.org/jira/browse/CTAKES-98
I think we should be able to use OpenNLP's Sentence Annotator directly if we no longer need the custom newline rule(s) [Or if we find that a fixed rule is still required, perhaps OpenNLP can support it via config as well- there doesn't seem to be anything cTAKES specific about it]. Pending the results of Tim's retraining/evaluation of the new models?? --Pei > -----Original Message----- > From: Jörn Kottmann [mailto:kottm...@gmail.com] > Sent: Wednesday, January 29, 2014 3:55 PM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On 01/27/2014 08:44 PM, Tim Miller wrote: > > > > That is a good point, and something I was wondering about. Having now > > looked at both the ctakes and opennlp code for the sentence splitter > > it seems like there is a lot of overlap. I would've thought it was > > just a matter of converting annotations into our type system. So I'm > > curious if there is some justification for why there seems to be > > duplication (or if I'm hallucinating it). > > It should be possible (and if not we should make it possible) to directly use > the opennlp-uima integration. It supports dynamic types which can be > mapped in the descriptor. > This would also give you a smooth transition, your existing integration could > be labeled as deprecated and be removed in one of the future releases. > > Jörn