Hi Sean and fellow Fast Dictionary Lookup fans,

I notice that the UmlsJdbcRareWordDictionary doesn't seem to index terms
from PREFTERM, only CUI_TERMS.

For instance, my custom dictionary script has the term "Angina Pectoris" as
a PREFTERM,
but "angina pectoris" isn't among the CUI_TERMS inserts:

INSERT INTO PREFTERM VALUES(2962,'Angina Pectoris')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'ischemic chest pain','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'ischaemic chest pain','ischaemic')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'angina','angina')
INSERT INTO CUI_TERMS VALUES(2962,4,5,'pain ; chest , ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal discomfort','anginal')
INSERT INTO CUI_TERMS VALUES(2962,4,5,'chest ; pain , ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal syndrome','anginal')
INSERT INTO CUI_TERMS VALUES(2962,2,3,'syndrome ; anginal','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'stenocardia','stenocardia')
INSERT INTO CUI_TERMS VALUES(2962,0,3,'anginal ; syndrome','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'angor pectoris','angor')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'stenocardias','stenocardias')
INSERT INTO CUI_TERMS VALUES(2962,2,3,'chest pain ischemic','ischemic')
INSERT INTO CUI_TERMS VALUES(2962,0,2,'anginal pain','anginal')
INSERT INTO CUI_TERMS VALUES(2962,0,1,'anginas','anginas')

So, in text containing the phrase "angina pectoris", the UmlsLookup
annotators identify only the CUI_TERMS term "angina" as a
SignSymptomMention.

First off, am I missing something?
I haven't used the default ctakessnorx.script dictionary for years.
Is this a peculiarity of the custom dictionaries I've been building?
Is there an option to include PREFTERMs in the rare-word index?
Or is there some reason *not* to include PREFTERMs -- would they mess up
the rare-word indexing somehow?

Certainly many PREFTERMs would never occur in the wild (e.g. "Benign
essential hypertension (disorder)"), but there are quite a few common
clinical terms that are in PREFTERM but not CUI_TERMs.  Off the top:
C0017168 gastroesophageal reflux disease, C0018802 congestive heart
failure, C0022104 irritable bowel syndrome, ...
We've been adding these to a supplementary BSV file as they come up, but
there are many more. This HSQL query for PREFTERM-only disorders on my
custom dictionary returns 175K+ rows; at first blush, 20% look legit.

select cui,lcase(prefterm) as prefterm
from tui t join prefterm p on p.cui=t.cui
and t.tui in
(19,20,33,34,37,40,41,42,43,44,45,46,47,48,49,50,56,57,184,190,191)
except (select cui,text from cui_terms c where c.cui=cui);

Thanks for any thoughts, and happy holidays.

Kean Kaufmann
RecordsOne

Reply via email to