I'm using the UMLS fast dictionary out of the box and mammography certainly
appears:

   {
      "_type": "UmlsConcept",
      "codingScheme": "SNOMEDCT_US",
      "code": "71651007",
      "score": 0.0,
      "disambiguated": false,
      "cui": "C0024671",
      "tui": "T060",
      "preferredText": "Mammography"
    },

The problem with pap smear is not that a concept isn't found, but that PAP
is also an acronym for something else: Prostatic acid phosphatase
{
      "_type": "UmlsConcept",
      "codingScheme": "SNOMEDCT_US",
      "code": "59518007",
      "score": 0.0,
      "disambiguated": false,
      "cui": "C0523444",
      "tui": "T059",
      "preferredText": "Prostatic acid phosphatase measurement"
    }

Oddly enough I can't get it to recognize any of its forms except for
"cervical smear test"





On Fri, May 29, 2020 at 8:54 AM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Hello Abad,
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
>
>> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
>
> where:
>
>    - CUI is the cui, nuf'said
>    - TEXT is the tokenized lowercase string for the entry. In your case
>    'pap smear'. Most punctuation is a separate token. Single quotes are
>    escaped by doubling them
>    - RWORD is the one token in TEXT that is the most indicative (least
>    common) which will be used as the index in the lookup. In your case
>    probably 'pap' since it is not as common as 'smear'
>    - RINDEX is the index of RWORD in TEXT. First token is 0 which is the
>    case for 'pap'
>    - TCOUNT is the token count for TEXT. In your case, 2
>
> So you would want to add:
>
>> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>>
>
>  If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
>    1. INSERT INTO TUI VALUES(CUI,TUI)
>    One line for each TUI in the taxonomy
>    2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
>    assuming you are adding a SNOMED
>    3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
>    where PREFTERM is the pretty string to describe the entry. It need not
>    correspond to any indexed entry. It is used for display once the lookup has
>    been successful.
>
> That's it. Use at your own discretion. No guarantees.
>
>
> *Rémy Sanouillet*
> NLP Engineer
> re...@foreseemed.com <xx...@foreseemed.com>
>
>
> [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
> ForeSee Medical, Inc.
> 12555 High Bluff Drive, Suite 100
> San Diego, CA 92130
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
> On Fri, May 29, 2020 at 7:34 AM <abad.ay...@cognizant.com> wrote:
>
>> Hi Team,
>>
>>
>>
>> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We
>> have faced situations where some of the expected tokens are not picked up
>> by cTAKES during clinical text extraction. So our first thought process was
>> to identify where the dictionary is configured and how that can be updated.
>> After some code analysis  it was found that the dictionary is configured in
>> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>>
>>
>>
>> We were able to open the hsqldb using the hsql db gui and found out that
>> some of our required entries are already there . So if I come specifically
>> to our current problem. The  Pap Smear and Mamogram are two clinical terms
>> which are not currently recognized by cTAKES in our profile.
>>
>> ·       If I look into the .script file , Pap Smear and
>> Mammogram/Mammography is already present in the .script file and in the
>> respective tables. PFB a snapshot as below
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> But still this was not recogonised by cTAKES. I see there are some
>> filters working on top of the available entries in dictionary(ctakes-gui
>> and ctake-gui-res). Will that be because of these filters the tokens are
>> not recognized as expected. Could you pls. share us what exactly these
>> filters do. This will help us in future also when we are trying to add new
>> terms into the dictionary
>>
>>
>>
>>
>>
>> ·       What are the steps to do if we need to add/edit entries into the
>> existing dictionaries. I see we can add/edit the existing values in
>> .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to
>> be added to dictionary how can I get the CUI and other values like
>> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
>> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
>> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
>> which is the better option from the below 3.
>>
>>
>>
>> o   Generate a custom dictionary using METAMORPHOSYS UML installation
>> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
>> full set of .rrf  files in the meta folder . Is this approach better if the
>> entries to be populated are maximal?
>>
>> o   Add/edit the available dictionary sno_rx_16ab and in that case how
>> to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
>> PREFTERM. If the entries to be populated are minimal is this approach would
>> be better?.
>>
>> o   Use a custom bsv , in that case how should we add  values to custom
>> bsv. Could you also provide a sample in that case.
>>
>>
>>
>> I found a Metathesaurus browser in the below url , where I can search for
>> the terms and get the CUI  and the respective source like ICD/CPT/MDR. But
>> still I was unable to get the other required attributes to  be populated
>> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these
>> attributes signifies
>>
>>
>>
>> https://uts.nlm.nih.gov//metathesaurus.html
>> <https://uts.nlm.nih.gov/metathesaurus.html>
>>
>>
>>
>> Kindly advise us on how to proceed on this and correct us if we went
>> wrong somewhere. This would be of great help for us
>>
>>
>>
>> P.S : We comply with UMLS license
>>
>>
>>
>>
>>
>> Thanks & Regards
>>
>> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>>
>> *Abad Ayyub*
>>
>> Vnet: 406170 | Cell : +91-9447379028
>>
>>
>>
>>
>> This e-mail and any files transmitted with it are for the sole use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful. Where permitted by
>> applicable law, this e-mail and other e-mail communications sent to and
>> from Cognizant e-mail addresses may be monitored. This e-mail and any files
>> transmitted with it are for the sole use of the intended recipient(s) and
>> may contain confidential and privileged information. If you are not the
>> intended recipient(s), please reply to the sender and destroy all copies of
>> the original message. Any unauthorized review, use, disclosure,
>> dissemination, forwarding, printing or copying of this email, and/or any
>> action taken in reliance on the contents of this e-mail is strictly
>> prohibited and may be unlawful. Where permitted by applicable law, this
>> e-mail and other e-mail communications sent to and from Cognizant e-mail
>> addresses may be monitored.
>>
>

Reply via email to