Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

Remy Sanouillet Fri, 29 May 2020 08:55:19 -0700

Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
caveats are that any mistake can stop all recognition and you will lose all
your mods on updates. So an additional dictionary is a recommended approach.


There are two cases. EIther the CUI you are adding already exists and you
are just adding a synonym. In that case, you only need to add one line:

> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)

where:

   - CUI is the cui, nuf'said
   - TEXT is the tokenized lowercase string for the entry. In your case
   'pap smear'. Most punctuation is a separate token. Single quotes are
   escaped by doubling them
   - RWORD is the one token in TEXT that is the most indicative (least
   common) which will be used as the index in the lookup. In your case
   probably 'pap' since it is not as common as 'smear'
   - RINDEX is the index of RWORD in TEXT. First token is 0 which is the
   case for 'pap'
   - TCOUNT is the token count for TEXT. In your case, 2

So you would want to add:

> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>

 If the entry is a non-existing one, you will need to add a few more lines.
Their positions are unimportant as long as they are below the header lines
(below the final "SET SCHEMA PUBLIC" line).

   1. INSERT INTO TUI VALUES(CUI,TUI)
   One line for each TUI in the taxonomy
   2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
   assuming you are adding a SNOMED
   3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
   where PREFTERM is the pretty string to describe the entry. It need not
   correspond to any indexed entry. It is used for display once the lookup has
   been successful.

That's it. Use at your own discretion. No guarantees.


*Rémy Sanouillet*
NLP Engineer
[email protected] <[email protected]>


[image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are
intended solely for the use of the addressee and may contain legally
privileged and confidential information. If the reader of this message is
not the intended recipient, or an employee or agent responsible for
delivering this message to the intended recipient, you are hereby notified
that any dissemination, distribution, copying, or other use of this message
or its attachments is strictly prohibited. If you have received this
message in error, please notify the sender immediately by replying to this
message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <[email protected]> wrote:

> Hi Team,
>
>
>
> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have
> faced situations where some of the expected tokens are not picked up by
> cTAKES during clinical text extraction. So our first thought process was to
> identify where the dictionary is configured and how that can be updated.
> After some code analysis  it was found that the dictionary is configured in
> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>
>
>
> We were able to open the hsqldb using the hsql db gui and found out that
> some of our required entries are already there . So if I come specifically
> to our current problem. The  Pap Smear and Mamogram are two clinical terms
> which are not currently recognized by cTAKES in our profile.
>
> ·       If I look into the .script file , Pap Smear and
> Mammogram/Mammography is already present in the .script file and in the
> respective tables. PFB a snapshot as below
>
>
>
>
>
>
>
>
>
> But still this was not recogonised by cTAKES. I see there are some filters
> working on top of the available entries in dictionary(ctakes-gui and
> ctake-gui-res). Will that be because of these filters the tokens are not
> recognized as expected. Could you pls. share us what exactly these filters
> do. This will help us in future also when we are trying to add new terms
> into the dictionary
>
>
>
>
>
> ·       What are the steps to do if we need to add/edit entries into the
> existing dictionaries. I see we can add/edit the existing values in
> .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to
> be added to dictionary how can I get the CUI and other values like
> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
> which is the better option from the below 3.
>
>
>
> o   Generate a custom dictionary using METAMORPHOSYS UML installation
> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
> full set of .rrf  files in the meta folder . Is this approach better if the
> entries to be populated are maximal?
>
> o   Add/edit the available dictionary sno_rx_16ab and in that case how to
> provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
> PREFTERM. If the entries to be populated are minimal is this approach would
> be better?.
>
> o   Use a custom bsv , in that case how should we add  values to custom
> bsv. Could you also provide a sample in that case.
>
>
>
> I found a Metathesaurus browser in the below url , where I can search for
> the terms and get the CUI  and the respective source like ICD/CPT/MDR. But
> still I was unable to get the other required attributes to  be populated
> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these
> attributes signifies
>
>
>
> https://uts.nlm.nih.gov//metathesaurus.html
> <https://uts.nlm.nih.gov/metathesaurus.html>
>
>
>
> Kindly advise us on how to proceed on this and correct us if we went wrong
> somewhere. This would be of great help for us
>
>
>
> P.S : We comply with UMLS license
>
>
>
>
>
> Thanks & Regards
>
> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>
> *Abad Ayyub*
>
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
>

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

Reply via email to