As a thank you for your suggestions, here's a little file that may help.

It's a command file for sed that will remove all short gene synonyms for
HGNC that collide with common english words of of 2,3,4 characters in
length.   You will only need it if you've included HGNC in your
vocabularies and Gene & Receptor TUIs in your dictionary

The common words list is a bit weird, containing some contemporary acronyms
that are not strictly speaking words.  But feel free to improve

https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-usa.txt

sed -f deletion_short_gene_terms_script < original_dict.script >
scrubbed_dict.script

Peter

Attachment: deletion_short_gene_terms_script.gz
Description: GNU Zip compressed data

Reply via email to