As a thank you for your suggestions, here's a little file that may help. It's a command file for sed that will remove all short gene synonyms for HGNC that collide with common english words of of 2,3,4 characters in length. You will only need it if you've included HGNC in your vocabularies and Gene & Receptor TUIs in your dictionary
The common words list is a bit weird, containing some contemporary acronyms that are not strictly speaking words. But feel free to improve https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-usa.txt sed -f deletion_short_gene_terms_script < original_dict.script > scrubbed_dict.script Peter
deletion_short_gene_terms_script.gz
Description: GNU Zip compressed data