You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further...
On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <timothy.mil...@childrens.harvard.edu> wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- >> CIN 2 - Cervical intraepithelial neoplasia 2 >> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> <timothy.mil...@childrens.harvard.edu> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm not >>> sure if that is a correct context but I was able to duplicate your >>> findings. (Finds a CUI for CIN III but not if you change it to CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems to >>> get it right, as CIN II and CIN III are both called NPs, and similarly >>> the LookupWindowAnnotator handles them both identically. So that >>> suggests it is a problem with the actual lookup of the tokens in the >>> LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more about >>> its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>>> Numerals, I,II, and III. >>>> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>>> "Carcinoma in situ of uterine cervix." >>>> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>>> "Cervical intraepithelial neoplasia grade 2" respectively. >>>> >>>> Is there a way to tune the detection of UMLS concepts? >>>> >>>> >>>> >>>> >>>> -------------------------------------------- >>>> Ted Assur >>>> IT Solutions Architect for Cancer Research >>>> Providence Health & Services >>>> ted.as...@providence.org >>>> 503-215-6476 >>>> >>>> Crede, ut intelligas. >>>> Intellego, ut credam. >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> This message is intended for the sole use of the addressee, and may >>>> contain information that is privileged, confidential and exempt from >>>> disclosure under applicable law. If you are not the addressee you are >>>> hereby notified that you may not use, copy, disclose, or distribute to >>>> anyone the message or any information contained in the message. If you >>>> have received this message in error, please immediately advise the sender >>>> by reply email and delete this message. >>>> >