Hi Peter,

I have noticed this and just added a following engine that recognized text 
within event spans.  It is a lazy solution, but it fit my needs and available 
time.

Sean
________________________________________
From: Peter Abramowitsch <pabramowit...@gmail.com>
Sent: Monday, January 3, 2022 5:03 PM
To: dev@ctakes.apache.org
Subject: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *


Hi All

I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
of subject history including the obvious "h/o" prefix.    Looking into the
distribution, there's a model.jar and what  appears to be a weights file
containing trigger words:
resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
are all given their own weights.   But I'm not sure that they're actually
used in this way:  see below.   However, there's also a tiny file:
/org/apache/ctakes/assertion/semantic_classes/history.txt
which does contain a few entries including "h/o" which I assume is used for
training but is never referred to anywhere.

Here's the behavior I'm seeing:
example input condition term found history feature marked range text
history of pregnancies "history of" included in the cu_term and prefterm yes
  no history of pregnancies
history of adenopathy "history of" not included in the cu_term or prefterm
yes yes adenopathy
H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
yes postpartum psychosis
H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
no postpartum psychosis
H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies

You can see that it is quite perverse -  there is a pattern suggesting that
if the concept definition occupies the history words, then they cannot be
seen by the history annotation engine.

Has anyone else noticed this - and have they done anything about it?

Peter

Reply via email to