Hi Jeff et al To take up the thread from a few days ago where a simple english word such as bed, soft, shop also maps into a legitimate but rarely used acronym and shows up in the same POS as a potentially interesting entity, what is the mechanism you would use to disambiguate?
This problem only started since I constructed a SNO+RX+HGNC dictionary from the 2020A UMLS dump. Adding more TUIS where a more conventional word-sense of the target word occurs, does not fix this problem. For instance, why does the sno_rx dictionary not contain this disease which aliases to "bed" ? ucsf_dict_v1 $ grep 3159311 *.script *INSERT INTO CUI_TERMS VALUES(3159311,0,1,'bed','bed')* INSERT INTO CUI_TERMS VALUES(3159311,5,8,'myopia , high , with nonprogressive cone dysfunction','nonprogressive') INSERT INTO CUI_TERMS VALUES(3159311,0,3,'bornholm eye disease','bornholm') INSERT INTO CUI_TERMS VALUES(3159311,5,6,'x-linked cone dysfunction syndrome with myopia','myopia') INSERT INTO TUI VALUES(3159311,47) *INSERT INTO PREFTERM VALUES(3159311,'BORNHOLM EYE DISEASE')* INSERT INTO SNOMEDCT_US VALUES(3159311,718718009) sno_rx_16ab $ grep 3159311 *.script nada Solutions good or evil? - Strip the relevant lines out of ths dict.script file? - Blacklist the text? - Add to my stopCUI list (a little feature I added)? - Some other configuration I don't know about? For instance, is there a CUI:ACRONYM table? I'm tempted to create one. This would require the matching term to be present in upper case. Peter