I'm using the UMLS fast dictionary out of the box and mammography certainly appears:
{ "_type": "UmlsConcept", "codingScheme": "SNOMEDCT_US", "code": "71651007", "score": 0.0, "disambiguated": false, "cui": "C0024671", "tui": "T060", "preferredText": "Mammography" }, The problem with pap smear is not that a concept isn't found, but that PAP is also an acronym for something else: Prostatic acid phosphatase { "_type": "UmlsConcept", "codingScheme": "SNOMEDCT_US", "code": "59518007", "score": 0.0, "disambiguated": false, "cui": "C0523444", "tui": "T059", "preferredText": "Prostatic acid phosphatase measurement" } Oddly enough I can't get it to recognize any of its forms except for "cervical smear test" On Fri, May 29, 2020 at 8:54 AM Remy Sanouillet <re...@foreseemed.com> wrote: > Hello Abad, > > The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of > caveats are that any mistake can stop all recognition and you will lose all > your mods on updates. So an additional dictionary is a recommended approach. > > There are two cases. EIther the CUI you are adding already exists and you > are just adding a synonym. In that case, you only need to add one line: > >> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD) > > where: > > - CUI is the cui, nuf'said > - TEXT is the tokenized lowercase string for the entry. In your case > 'pap smear'. Most punctuation is a separate token. Single quotes are > escaped by doubling them > - RWORD is the one token in TEXT that is the most indicative (least > common) which will be used as the index in the lookup. In your case > probably 'pap' since it is not as common as 'smear' > - RINDEX is the index of RWORD in TEXT. First token is 0 which is the > case for 'pap' > - TCOUNT is the token count for TEXT. In your case, 2 > > So you would want to add: > >> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap') >> > > If the entry is a non-existing one, you will need to add a few more > lines. Their positions are unimportant as long as they are below the header > lines (below the final "SET SCHEMA PUBLIC" line). > > 1. INSERT INTO TUI VALUES(CUI,TUI) > One line for each TUI in the taxonomy > 2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) > assuming you are adding a SNOMED > 3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM) > where PREFTERM is the pretty string to describe the entry. It need not > correspond to any indexed entry. It is used for display once the lookup has > been successful. > > That's it. Use at your own discretion. No guarantees. > > > *Rémy Sanouillet* > NLP Engineer > re...@foreseemed.com <xx...@foreseemed.com> > > > [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15] > ForeSee Medical, Inc. > 12555 High Bluff Drive, Suite 100 > San Diego, CA 92130 > > NOTICE: This e-mail message and all attachments transmitted with it are > intended solely for the use of the addressee and may contain legally > privileged and confidential information. If the reader of this message is > not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby notified > that any dissemination, distribution, copying, or other use of this message > or its attachments is strictly prohibited. If you have received this > message in error, please notify the sender immediately by replying to this > message and please delete it from your computer. > > > On Fri, May 29, 2020 at 7:34 AM <abad.ay...@cognizant.com> wrote: > >> Hi Team, >> >> >> >> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We >> have faced situations where some of the expected tokens are not picked up >> by cTAKES during clinical text extraction. So our first thought process was >> to identify where the dictionary is configured and how that can be updated. >> After some code analysis it was found that the dictionary is configured in >> the below path under ctakes/resources for sources RxNorm and SNOMEDCT_US >> >> >> >> We were able to open the hsqldb using the hsql db gui and found out that >> some of our required entries are already there . So if I come specifically >> to our current problem. The Pap Smear and Mamogram are two clinical terms >> which are not currently recognized by cTAKES in our profile. >> >> · If I look into the .script file , Pap Smear and >> Mammogram/Mammography is already present in the .script file and in the >> respective tables. PFB a snapshot as below >> >> >> >> >> >> >> >> >> >> But still this was not recogonised by cTAKES. I see there are some >> filters working on top of the available entries in dictionary(ctakes-gui >> and ctake-gui-res). Will that be because of these filters the tokens are >> not recognized as expected. Could you pls. share us what exactly these >> filters do. This will help us in future also when we are trying to add new >> terms into the dictionary >> >> >> >> >> >> · What are the steps to do if we need to add/edit entries into the >> existing dictionaries. I see we can add/edit the existing values in >> .scripts files but our primary doubt is if suppose I have a term ‘xyz’ to >> be added to dictionary how can I get the CUI and other values like >> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value >> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom >> bsv dictionaries but couldn’t see much documentation for it. Kindly advise >> which is the better option from the below 3. >> >> >> >> o Generate a custom dictionary using METAMORPHOSYS UML installation >> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the >> full set of .rrf files in the meta folder . Is this approach better if the >> entries to be populated are maximal? >> >> o Add/edit the available dictionary sno_rx_16ab and in that case how >> to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and >> PREFTERM. If the entries to be populated are minimal is this approach would >> be better?. >> >> o Use a custom bsv , in that case how should we add values to custom >> bsv. Could you also provide a sample in that case. >> >> >> >> I found a Metathesaurus browser in the below url , where I can search for >> the terms and get the CUI and the respective source like ICD/CPT/MDR. But >> still I was unable to get the other required attributes to be populated >> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these >> attributes signifies >> >> >> >> https://uts.nlm.nih.gov//metathesaurus.html >> <https://uts.nlm.nih.gov/metathesaurus.html> >> >> >> >> Kindly advise us on how to proceed on this and correct us if we went >> wrong somewhere. This would be of great help for us >> >> >> >> P.S : We comply with UMLS license >> >> >> >> >> >> Thanks & Regards >> >> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598] >> >> *Abad Ayyub* >> >> Vnet: 406170 | Cell : +91-9447379028 >> >> >> >> >> This e-mail and any files transmitted with it are for the sole use of the >> intended recipient(s) and may contain confidential and privileged >> information. If you are not the intended recipient(s), please reply to the >> sender and destroy all copies of the original message. Any unauthorized >> review, use, disclosure, dissemination, forwarding, printing or copying of >> this email, and/or any action taken in reliance on the contents of this >> e-mail is strictly prohibited and may be unlawful. Where permitted by >> applicable law, this e-mail and other e-mail communications sent to and >> from Cognizant e-mail addresses may be monitored. This e-mail and any files >> transmitted with it are for the sole use of the intended recipient(s) and >> may contain confidential and privileged information. If you are not the >> intended recipient(s), please reply to the sender and destroy all copies of >> the original message. Any unauthorized review, use, disclosure, >> dissemination, forwarding, printing or copying of this email, and/or any >> action taken in reliance on the contents of this e-mail is strictly >> prohibited and may be unlawful. Where permitted by applicable law, this >> e-mail and other e-mail communications sent to and from Cognizant e-mail >> addresses may be monitored. >> >