RE: sentence detector newline behavior

Chen, Pei Wed, 29 Jan 2014 13:06:36 -0800

+1
There's an example of the configs here :)
https://issues.apache.org/jira/browse/CTAKES-98


I think we should be able to use OpenNLP's Sentence Annotator directly if we no 
longer need the custom newline rule(s) 
[Or if we find that a fixed rule is still required, perhaps OpenNLP can support 
it via config as well- there doesn't seem to be anything cTAKES specific about 
it].
Pending the results of Tim's retraining/evaluation of the new models??

--Pei
> -----Original Message-----
> From: Jörn Kottmann [mailto:kottm...@gmail.com]
> Sent: Wednesday, January 29, 2014 3:55 PM
> To: dev@ctakes.apache.org
> Subject: Re: sentence detector newline behavior
> 
> On 01/27/2014 08:44 PM, Tim Miller wrote:
> >
> > That is a good point, and something I was wondering about. Having now
> > looked at both the ctakes and opennlp code for the sentence splitter
> > it seems like there is a lot of overlap. I would've thought it was
> > just a matter of converting annotations into our type system. So I'm
> > curious if there is some justification for why there seems to be
> > duplication (or if I'm hallucinating it).
> 
> It should be possible (and if not we should make it possible) to directly use
> the opennlp-uima integration. It supports dynamic types which can be
> mapped in the descriptor.
> This would also give you a smooth transition, your existing integration could
> be labeled as deprecated and be removed in one of the future releases.
> 
> Jörn

RE: sentence detector newline behavior

Reply via email to