I'm using the UMLS fast dictionary out of the box and mammography certainly
appears:
{
"_type": "UmlsConcept",
"codingScheme": "SNOMEDCT_US",
"code": "71651007",
"score": 0.0,
"disambiguated": false,
"cui": "C0024671",
"tui": "T060",
"preferredText": "Mammography"
},
The problem with pap smear is not that a concept isn't found, but that PAP
is also an acronym for something else: Prostatic acid phosphatase
{
"_type": "UmlsConcept",
"codingScheme": "SNOMEDCT_US",
"code": "59518007",
"score": 0.0,
"disambiguated": false,
"cui": "C0523444",
"tui": "T059",
"preferredText": "Prostatic acid phosphatase measurement"
}
Oddly enough I can't get it to recognize any of its forms except for
"cervical smear test"
On Fri, May 29, 2020 at 8:54 AM Remy Sanouillet <[email protected]>
wrote:
> Hello Abad,
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
>
>> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
>
> where:
>
> - CUI is the cui, nuf'said
> - TEXT is the tokenized lowercase string for the entry. In your case
> 'pap smear'. Most punctuation is a separate token. Single quotes are
> escaped by doubling them
> - RWORD is the one token in TEXT that is the most indicative (least
> common) which will be used as the index in the lookup. In your case
> probably 'pap' since it is not as common as 'smear'
> - RINDEX is the index of RWORD in TEXT. First token is 0 which is the
> case for 'pap'
> - TCOUNT is the token count for TEXT. In your case, 2
>
> So you would want to add:
>
>> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>>
>
> If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
> 1. INSERT INTO TUI VALUES(CUI,TUI)
> One line for each TUI in the taxonomy
> 2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
> assuming you are adding a SNOMED
> 3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
> where PREFTERM is the pretty string to describe the entry. It need not
> correspond to any indexed entry. It is used for display once the lookup has
> been successful.
>
> That's it. Use at your own discretion. No guarantees.
>
>
> *Rémy Sanouillet*
> NLP Engineer
> [email protected] <[email protected]>
>
>
> [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
> ForeSee Medical, Inc.
> 12555 High Bluff Drive, Suite 100
> San Diego, CA 92130
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
> On Fri, May 29, 2020 at 7:34 AM <[email protected]> wrote:
>
>> Hi Team,
>>
>>
>>
>> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We
>> have faced situations where some of the expected tokens are not picked up
>> by cTAKES during clinical text extraction. So our first thought process was
>> to identify where the dictionary is configured and how that can be updated.
>> After some code analysis it was found that the dictionary is configured in
>> the below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>>
>>
>>
>> We were able to open the hsqldb using the hsql db gui and found out that
>> some of our required entries are already there . So if I come specifically
>> to our current problem. The Pap Smear and Mamogram are two clinical terms
>> which are not currently recognized by cTAKES in our profile.
>>
>> · If I look into the .script file , Pap Smear and
>> Mammogram/Mammography is already present in the .script file and in the
>> respective tables. PFB a snapshot as below
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> But still this was not recogonised by cTAKES. I see there are some
>> filters working on top of the available entries in dictionary(ctakes-gui
>> and ctake-gui-res). Will that be because of these filters the tokens are
>> not recognized as expected. Could you pls. share us what exactly these
>> filters do. This will help us in future also when we are trying to add new
>> terms into the dictionary
>>
>>
>>
>>
>>
>> · What are the steps to do if we need to add/edit entries into the
>> existing dictionaries. I see we can add/edit the existing values in
>> .scripts files but our primary doubt is if suppose I have a term ‘xyz’ to
>> be added to dictionary how can I get the CUI and other values like
>> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
>> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
>> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
>> which is the better option from the below 3.
>>
>>
>>
>> o Generate a custom dictionary using METAMORPHOSYS UML installation
>> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
>> full set of .rrf files in the meta folder . Is this approach better if the
>> entries to be populated are maximal?
>>
>> o Add/edit the available dictionary sno_rx_16ab and in that case how
>> to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
>> PREFTERM. If the entries to be populated are minimal is this approach would
>> be better?.
>>
>> o Use a custom bsv , in that case how should we add values to custom
>> bsv. Could you also provide a sample in that case.
>>
>>
>>
>> I found a Metathesaurus browser in the below url , where I can search for
>> the terms and get the CUI and the respective source like ICD/CPT/MDR. But
>> still I was unable to get the other required attributes to be populated
>> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these
>> attributes signifies
>>
>>
>>
>> https://uts.nlm.nih.gov//metathesaurus.html
>> <https://uts.nlm.nih.gov/metathesaurus.html>
>>
>>
>>
>> Kindly advise us on how to proceed on this and correct us if we went
>> wrong somewhere. This would be of great help for us
>>
>>
>>
>> P.S : We comply with UMLS license
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>>
>> *Abad Ayyub*
>>
>> Vnet: 406170 | Cell : +91-9447379028
>>
>>
>>
>>
>> This e-mail and any files transmitted with it are for the sole use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful. Where permitted by
>> applicable law, this e-mail and other e-mail communications sent to and
>> from Cognizant e-mail addresses may be monitored. This e-mail and any files
>> transmitted with it are for the sole use of the intended recipient(s) and
>> may contain confidential and privileged information. If you are not the
>> intended recipient(s), please reply to the sender and destroy all copies of
>> the original message. Any unauthorized review, use, disclosure,
>> dissemination, forwarding, printing or copying of this email, and/or any
>> action taken in reliance on the contents of this e-mail is strictly
>> prohibited and may be unlawful. Where permitted by applicable law, this
>> e-mail and other e-mail communications sent to and from Cognizant e-mail
>> addresses may be monitored.
>>
>