Hi All I've noticed that the HistoryCleartkAnalysisEngine misses many common forms of subject history including the obvious "h/o" prefix. Looking into the distribution, there's a model.jar and what appears to be a weights file containing trigger words: resources/org/apache/ctakes/assertion/models/history.txt where h, o, / are all given their own weights. But I'm not sure that they're actually used in this way: see below. However, there's also a tiny file: /org/apache/ctakes/assertion/semantic_classes/history.txt which does contain a few entries including "h/o" which I assume is used for training but is never referred to anywhere.
Here's the behavior I'm seeing: example input condition term found history feature marked range text history of pregnancies "history of" included in the cu_term and prefterm yes no history of pregnancies history of adenopathy "history of" not included in the cu_term or prefterm yes yes adenopathy H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes yes postpartum psychosis H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes no postpartum psychosis H/O pregnancies "h/o" included in the cu_term yes no h/o pregnancies You can see that it is quite perverse - there is a pattern suggesting that if the concept definition occupies the history words, then they cannot be seen by the history annotation engine. Has anyone else noticed this - and have they done anything about it? Peter