I have been working on the sentence detector newline issue, training a model to 
probabilistically split sentences on newlines rather than forcing sentence 
breaks. I have checked in a model to the repo under ctakes-core-res. I also 
attached a patch to ctakes-core to the jira issue:
https://issues.apache.org/jira/browse/CTAKES-41

for people to test. The status of my testing is that it doesn't seem to break 
on notes where ctakes worked well before (those where newlines are always 
sentence breaks), and is a slight improvement on notes where newlines may or 
may not be sentence breaks. Once the change is checked in we can continue 
improving the model by adding more data and features, but the first hurdle I'd 
like to get past is making sure it runs well enough on the type of data that 
the old model worked well on. Let me know if you have any questions.

Thanks
Tim

Reply via email to