RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Hi Ted, In addition to performing searches, > the hyperSql ( http://hsqldb.org/ ) database tool should allow you to perform inserts into the umls dictionary database used by cTakes. You can also create your own customized dictionary and run cTakes using only that dictionary or with umls plus that dictionary. There are several ways to create a custom dictionary, and I think that you can start by looking in the resources/ ... /dictionary/lookup/ directory for examples. It can be a little overwhelming if you just want to add one or two terms, and I am in the process of trying to make this a little easier for any user. It may be a while before I can add my work to the trunk. Until then, if you decide to go with the csv approach you can probably make it through with the examples in cTakes resources. If you want to create a new hsql database then I can send you my (old) instructions on that process - but it might be overkill. If you really want to know what lies behind the mask of the cTakes umls dictionary then I highly recommend that you just interface with it directly using the hsql tool. Sean From: Assur, Ted [theodore.as...@providence.org] Sent: Friday, November 01, 2013 5:36 PM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor OK, Kind of resurfacing the original topic on this one, after I redirected it towards ICD codes last month: I have several examples, like the one below, where it would be very helpful to be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" (CUI = C0349458). So if I have particular UMLS concepts I want to make sure and include, is there a way for me to *add* them to the umls dictionary used by cTAKES? Ted -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 04, 2013 9:37 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes. For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN. In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1] If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes. Sean -Original Message- From: Assur, Ted [mailto:theodore.as...@providence.org] Sent: Wednesday, September 04, 2013 11:56 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Thanks for looking into this, it's been puzzling me. On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity? Thank you Ted -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 7:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
OK, Kind of resurfacing the original topic on this one, after I redirected it towards ICD codes last month: I have several examples, like the one below, where it would be very helpful to be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" (CUI = C0349458). So if I have particular UMLS concepts I want to make sure and include, is there a way for me to *add* them to the umls dictionary used by cTAKES? Ted -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 04, 2013 9:37 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes. For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN. In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1] If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes. Sean -Original Message- From: Assur, Ted [mailto:theodore.as...@providence.org] Sent: Wednesday, September 04, 2013 11:56 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Thanks for looking into this, it's been puzzling me. On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity? Thank you Ted -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 7:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm >>> not sure if that is a correct context but I was able to duplicate >>> your findings. (Finds a CUI for CIN III but not if you change it to >>> CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems >>> to get it right, as CIN II and CIN III are both called NPs, and >>> similarly the LookupWindowAnnotator handles them both identically. >>> So that suggests it is a problem with the actual lookup of the >>> tokens in the LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more >>> about its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Hi James, Glad you were able to make cTAKES work for your use case. The UMLS subset that is currently included in the resources should be: * International Classification of Diseases, Ninth Revision, Clinical Modification, 2012 ICD9CM_2012 ICD9CM ENG 0 20997 * International Classification of Diseases, Ninth Revision, Clinical Modification, Metathesaurus additional entry terms, 2012 MTHICD9_2012 ICD9CM ENG 0 16304 * Medical Subject Headings, 2012_2011_09_09 MSH2012_2011_09_09 MSH ENG 0 321367 * NCI Thesaurus, 2011_02D NCI2011_02D NCI ENG 0 90135 * SNOMED Clinical Terms, 2011_07_31 SNOMEDCT_2011_07_31 SNOMEDCTENG 9 324494 And also RxNorm for the rxnorm_index folder. (I think there was a readme about it, if not, let's at least add it to the User FAQ's?) --Pei > -Original Message- > From: Vogel, James [mailto:jvo...@activehealth.net] > Sent: Monday, September 30, 2013 11:41 AM > To: dev@ctakes.apache.org > Subject: RE: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > That worked and I see how I can change the code to do both SNOMED and > ICD9. > I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON > umls_ms_2011ab (cui); I needed to change the database from 'read-only', is > that going to cause any other problems? > > What subset of ICD9 is in the dictionary? > > From: Pei Chen [mailto:chen...@apache.org] > Sent: Friday, September 27, 2013 11:26 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > James, > Obviously it would be best to customize the code and/or the dictionary for > your particular case. > But if you want to try something that will work without any code changes, > you can try the below in your LookupDesc_Db.xml Essentially, what it will do > is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will > allow you to specify an SQL statement that maps the CUI's to Codes. Couple > by the fact that there already is a table called umls_ms_2011ab which > contains the codes and cui's from many different sources including ICD9CM. > What you could do is just reuse the table as the mapping table as well and > specify the source such as: > select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM' > > (The downside is that I don't think there is a index on sourcetype so > performance may suck). > I've attached an example to normalize to ICD9CM codes instead of > SNOMEDCT. > className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbCons > umerImpl"> > > key="cuiMetaField" value="cui"/> value="tui"/> value="T021,T022,T023,T024,T025,T026,T029,T030"/> > key="disorderTuis" > value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/> > value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/> > key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=? > and sourcetype='ICD9CM'"/> > > On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen > mailto:chen...@apache.org>> wrote: > James, > One can try the NamedEntityLookupConsumerImpl instead of > UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only > contain SNOMED codes. > Will you need to preserve the TUI? One thing is that > NamedEntityLookupConsumerImpl will return back all of the hits, except that > it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts. Perhaps > we should make the NamedEntityLookupConsumerImpl a bit more general. > > --Pei > > On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James > mailto:jvo...@activehealth.net>> wrote: > I now see that I use a query on umls_ms_2011ab where sourcetype = > 'ICD9CM'. Is there a way to use an existing AE or class to add additional > ICD9CM annotations / concepts or do I change the code in consumeHits() or > getSnomedCodes()? > > -Original Message- > From: Vogel, James > Sent: Friday, September 27, 2013 6:30 PM > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > Subject: RE: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Is anyone able to provide any more detailed guidance on what I'd need to > change to add the ICD9 codes as tags, e.g., where do I look for the tables in > the hsql database that would contain the ICD9 data? > > Thanks. > > -Original Message- > From: Miller, Timothy > [mailt
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
That worked and I see how I can change the code to do both SNOMED and ICD9. I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON umls_ms_2011ab (cui); I needed to change the database from 'read-only', is that going to cause any other problems? What subset of ICD9 is in the dictionary? From: Pei Chen [mailto:chen...@apache.org] Sent: Friday, September 27, 2013 11:26 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor James, Obviously it would be best to customize the code and/or the dictionary for your particular case. But if you want to try something that will work without any code changes, you can try the below in your LookupDesc_Db.xml Essentially, what it will do is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that maps the CUI's to Codes. Couple by the fact that there already is a table called umls_ms_2011ab which contains the codes and cui's from many different sources including ICD9CM. What you could do is just reuse the table as the mapping table as well and specify the source such as: select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM' (The downside is that I don't think there is a index on sourcetype so performance may suck). I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT. On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen mailto:chen...@apache.org>> wrote: James, One can try the NamedEntityLookupConsumerImpl instead of UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only contain SNOMED codes. Will you need to preserve the TUI? One thing is that NamedEntityLookupConsumerImpl will return back all of the hits, except that it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts. Perhaps we should make the NamedEntityLookupConsumerImpl a bit more general. --Pei On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James mailto:jvo...@activehealth.net>> wrote: I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'. Is there a way to use an existing AE or class to add additional ICD9CM annotations / concepts or do I change the code in consumeHits() or getSnomedCodes()? -Original Message- From: Vogel, James Sent: Friday, September 27, 2013 6:30 PM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data? Thanks. -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>] Sent: Monday, September 16, 2013 7:25 AM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor James, I haven't done it myself, so I don't know exactly how the config changes, but I know roughly where to look. In the LookupDesc_Db.xml, the tag with the idRef = DICT_UMLS_MS. Then look under the section, and you'll see the codingScheme is SNOMED. I believe this is where the actual dictionary filtering is done. There is also a consumer class called org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a mapPrepStmt field with a SQL query that might need changing. That is where I would start looking, I'm not sure whether you would need to write a new consumer class, and what values the codingScheme field can take, but hopefully this helps you get started until someone else chimes in with more detailed info! Tim On 09/15/2013 08:39 PM, Vogel, James wrote: > Any more guidance you can give about the nature of the changes to the config > and impl that would need to be made to get the ICD9 codes? > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org<mailto:chen...@apache.org>] > Sent: Wednesday, September 04, 2013 1:02 PM > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Ted, > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar> with how to access that information: In the example I've > described below, > >> where would I locate the ICD9 for a specific entity? > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or > RxNorm code. > > [1] > http://svn.apache.org/repos/asf/ctakes/trunk/ctak
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
James, Obviously it would be best to customize the code and/or the dictionary for your particular case. But if you want to try something that will work without any code changes, you can try the below in your LookupDesc_Db.xml Essentially, what it will do is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that maps the CUI's to Codes. Couple by the fact that there already is a table called umls_ms_2011ab which contains the codes and cui's from many different sources including ICD9CM. What you could do is just reuse the table as the mapping table as well and specify the source such as: select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM' (The downside is that I don't think there is a index on sourcetype so performance may suck). I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT. On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen wrote: > James, > One can try the NamedEntityLookupConsumerImpl instead of > UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only > contain SNOMED codes. > Will you need to preserve the TUI? One thing is that > NamedEntityLookupConsumerImpl will return back all of the hits, except that > it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts. Perhaps > we should make the NamedEntityLookupConsumerImpl a bit more general. > > --Pei > > > On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James wrote: > >> I now see that I use a query on umls_ms_2011ab where sourcetype = >> 'ICD9CM'. Is there a way to use an existing AE or class to add additional >> ICD9CM annotations / concepts or do I change the code in consumeHits() or >> getSnomedCodes()? >> >> -----Original Message- >> From: Vogel, James >> Sent: Friday, September 27, 2013 6:30 PM >> To: dev@ctakes.apache.org >> Subject: RE: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> Is anyone able to provide any more detailed guidance on what I'd need to >> change to add the ICD9 codes as tags, e.g., where do I look for the tables >> in the hsql database that would contain the ICD9 data? >> >> Thanks. >> >> -Original Message- >> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] >> Sent: Monday, September 16, 2013 7:25 AM >> To: dev@ctakes.apache.org >> Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> James, >> I haven't done it myself, so I don't know exactly how the config >> changes, but I know roughly where to look. In the LookupDesc_Db.xml, >> the tag with the idRef = DICT_UMLS_MS. Then look under >> the section, and you'll see the codingScheme is SNOMED. >> I believe this is where the actual dictionary filtering is done. There >> is also a consumer class called >> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a >> mapPrepStmt field with a SQL query that might need changing. That is >> where I would start looking, I'm not sure whether you would need to >> write a new consumer class, and what values the codingScheme field can >> take, but hopefully this helps you get started until someone else chimes >> in with more detailed info! >> >> Tim >> >> On 09/15/2013 08:39 PM, Vogel, James wrote: >> > Any more guidance you can give about the nature of the changes to the >> config and impl that would need to be made to get the ICD9 codes? >> > >> > -Original Message- >> > From: Pei Chen [mailto:chen...@apache.org] >> > Sent: Wednesday, September 04, 2013 1:02 PM >> > To: dev@ctakes.apache.org >> > Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> > >> > Ted, >> > >> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not >> > familiar> with how to access that information: In the example I've >> > described below, >> > >> >> where would I locate the ICD9 for a specific entity? >> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is >> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT >> code or >> > RxNorm code. >> > >> > [1] >> > >> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml >> > >> > [2] >>
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
James, One can try the NamedEntityLookupConsumerImpl instead of UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only contain SNOMED codes. Will you need to preserve the TUI? One thing is that NamedEntityLookupConsumerImpl will return back all of the hits, except that it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts. Perhaps we should make the NamedEntityLookupConsumerImpl a bit more general. --Pei On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James wrote: > I now see that I use a query on umls_ms_2011ab where sourcetype = > 'ICD9CM'. Is there a way to use an existing AE or class to add additional > ICD9CM annotations / concepts or do I change the code in consumeHits() or > getSnomedCodes()? > > -Original Message- > From: Vogel, James > Sent: Friday, September 27, 2013 6:30 PM > To: dev@ctakes.apache.org > Subject: RE: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Is anyone able to provide any more detailed guidance on what I'd need to > change to add the ICD9 codes as tags, e.g., where do I look for the tables > in the hsql database that would contain the ICD9 data? > > Thanks. > > -Original Message- > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Monday, September 16, 2013 7:25 AM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > James, > I haven't done it myself, so I don't know exactly how the config > changes, but I know roughly where to look. In the LookupDesc_Db.xml, > the tag with the idRef = DICT_UMLS_MS. Then look under > the section, and you'll see the codingScheme is SNOMED. > I believe this is where the actual dictionary filtering is done. There > is also a consumer class called > org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a > mapPrepStmt field with a SQL query that might need changing. That is > where I would start looking, I'm not sure whether you would need to > write a new consumer class, and what values the codingScheme field can > take, but hopefully this helps you get started until someone else chimes > in with more detailed info! > > Tim > > On 09/15/2013 08:39 PM, Vogel, James wrote: > > Any more guidance you can give about the nature of the changes to the > config and impl that would need to be made to get the ICD9 codes? > > > > -Original Message----- > > From: Pei Chen [mailto:chen...@apache.org] > > Sent: Wednesday, September 04, 2013 1:02 PM > > To: dev@ctakes.apache.org > > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > > > Ted, > > > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > > familiar> with how to access that information: In the example I've > > described below, > > > >> where would I locate the ICD9 for a specific entity? > > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code > or > > RxNorm code. > > > > [1] > > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml > > > > [2] > > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java > > > > If you would like it to return ICD9 codes, one would need to > > modify/configure the above... > > > > --Pei > > > > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > > wrote: > > > >> Thanks for looking into this, it's been puzzling me. > >> > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > >> familiar with how to access that information: In the example I've > described > >> below, where would I locate the ICD9 for a specific entity? > >> > >> Thank you > >> > >> Ted > >> > >> -Original Message- > >> From: Pei Chen [mailto:chen...@apache.org] > >> Sent: Tuesday, September 03, 2013 7:13 PM > >> To: dev@ctakes.apache.org > >> Subject: Re: specificity in selecting EntityMentions when using > >> AggregatePlaintextUMLSProcessor > >> > >> You're right, it should have gotten "CIN I"- that's a strange one, > >> probably needs to be debugged/looked into further... > >> >
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'. Is there a way to use an existing AE or class to add additional ICD9CM annotations / concepts or do I change the code in consumeHits() or getSnomedCodes()? -Original Message- From: Vogel, James Sent: Friday, September 27, 2013 6:30 PM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data? Thanks. -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, September 16, 2013 7:25 AM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor James, I haven't done it myself, so I don't know exactly how the config changes, but I know roughly where to look. In the LookupDesc_Db.xml, the tag with the idRef = DICT_UMLS_MS. Then look under the section, and you'll see the codingScheme is SNOMED. I believe this is where the actual dictionary filtering is done. There is also a consumer class called org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a mapPrepStmt field with a SQL query that might need changing. That is where I would start looking, I'm not sure whether you would need to write a new consumer class, and what values the codingScheme field can take, but hopefully this helps you get started until someone else chimes in with more detailed info! Tim On 09/15/2013 08:39 PM, Vogel, James wrote: > Any more guidance you can give about the nature of the changes to the config > and impl that would need to be made to get the ICD9 codes? > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Wednesday, September 04, 2013 1:02 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Ted, > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar> with how to access that information: In the example I've > described below, > >> where would I locate the ICD9 for a specific entity? > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or > RxNorm code. > > [1] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml > > [2] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java > > If you would like it to return ICD9 codes, one would need to > modify/configure the above... > > --Pei > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > wrote: > >> Thanks for looking into this, it's been puzzling me. >> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not >> familiar with how to access that information: In the example I've described >> below, where would I locate the ICD9 for a specific entity? >> >> Thank you >> >> Ted >> >> -Original Message- >> From: Pei Chen [mailto:chen...@apache.org] >> Sent: Tuesday, September 03, 2013 7:13 PM >> To: dev@ctakes.apache.org >> Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> You're right, it should have gotten "CIN I"- that's a strange one, >> probably needs to be debugged/looked into further... >> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < >> timothy.mil...@childrens.harvard.edu> wrote: >>> Ah. So it will get >>> CIN 2 (in SNOMED) >>> CIN III (in SNOMED) >>> CIN 3 (in SNOMED) >>> >>> but the rest are not in SNOMED? >>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED >>> (though I don't fully understand what all the symbols mean in the umls >>> browser). >>> >>>> CIN I - Cervical intraepithelial neoplasia 1 >>>> [A3002690/SNOMEDCT/SY/285836003] >>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote: >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >>>> CIN III [A965/SNO
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Is anyone able to provide any more detailed guidance on what I'd need to change to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql database that would contain the ICD9 data? Thanks. -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, September 16, 2013 7:25 AM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor James, I haven't done it myself, so I don't know exactly how the config changes, but I know roughly where to look. In the LookupDesc_Db.xml, the tag with the idRef = DICT_UMLS_MS. Then look under the section, and you'll see the codingScheme is SNOMED. I believe this is where the actual dictionary filtering is done. There is also a consumer class called org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a mapPrepStmt field with a SQL query that might need changing. That is where I would start looking, I'm not sure whether you would need to write a new consumer class, and what values the codingScheme field can take, but hopefully this helps you get started until someone else chimes in with more detailed info! Tim On 09/15/2013 08:39 PM, Vogel, James wrote: > Any more guidance you can give about the nature of the changes to the config > and impl that would need to be made to get the ICD9 codes? > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Wednesday, September 04, 2013 1:02 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Ted, > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar> with how to access that information: In the example I've > described below, > >> where would I locate the ICD9 for a specific entity? > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or > RxNorm code. > > [1] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml > > [2] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java > > If you would like it to return ICD9 codes, one would need to > modify/configure the above... > > --Pei > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > wrote: > >> Thanks for looking into this, it's been puzzling me. >> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not >> familiar with how to access that information: In the example I've described >> below, where would I locate the ICD9 for a specific entity? >> >> Thank you >> >> Ted >> >> -Original Message- >> From: Pei Chen [mailto:chen...@apache.org] >> Sent: Tuesday, September 03, 2013 7:13 PM >> To: dev@ctakes.apache.org >> Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> You're right, it should have gotten "CIN I"- that's a strange one, >> probably needs to be debugged/looked into further... >> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < >> timothy.mil...@childrens.harvard.edu> wrote: >>> Ah. So it will get >>> CIN 2 (in SNOMED) >>> CIN III (in SNOMED) >>> CIN 3 (in SNOMED) >>> >>> but the rest are not in SNOMED? >>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED >>> (though I don't fully understand what all the symbols mean in the umls >>> browser). >>> >>>> CIN I - Cervical intraepithelial neoplasia 1 >>>> [A3002690/SNOMEDCT/SY/285836003] >>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote: >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >>>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >>>> able to perform the lookup successfully. >>>> Note that CIN II synonyms do exist in other umls thersauses such as >>>> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >>>> >>>> --Pei >>>> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Mi
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
James, I haven't done it myself, so I don't know exactly how the config changes, but I know roughly where to look. In the LookupDesc_Db.xml, the tag with the idRef = DICT_UMLS_MS. Then look under the section, and you'll see the codingScheme is SNOMED. I believe this is where the actual dictionary filtering is done. There is also a consumer class called org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a mapPrepStmt field with a SQL query that might need changing. That is where I would start looking, I'm not sure whether you would need to write a new consumer class, and what values the codingScheme field can take, but hopefully this helps you get started until someone else chimes in with more detailed info! Tim On 09/15/2013 08:39 PM, Vogel, James wrote: > Any more guidance you can give about the nature of the changes to the config > and impl that would need to be made to get the ICD9 codes? > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Wednesday, September 04, 2013 1:02 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Ted, > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar> with how to access that information: In the example I've > described below, > >> where would I locate the ICD9 for a specific entity? > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or > RxNorm code. > > [1] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml > > [2] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java > > If you would like it to return ICD9 codes, one would need to > modify/configure the above... > > --Pei > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > wrote: > >> Thanks for looking into this, it's been puzzling me. >> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not >> familiar with how to access that information: In the example I've described >> below, where would I locate the ICD9 for a specific entity? >> >> Thank you >> >> Ted >> >> -Original Message- >> From: Pei Chen [mailto:chen...@apache.org] >> Sent: Tuesday, September 03, 2013 7:13 PM >> To: dev@ctakes.apache.org >> Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> You're right, it should have gotten "CIN I"- that's a strange one, >> probably needs to be debugged/looked into further... >> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < >> timothy.mil...@childrens.harvard.edu> wrote: >>> Ah. So it will get >>> CIN 2 (in SNOMED) >>> CIN III (in SNOMED) >>> CIN 3 (in SNOMED) >>> >>> but the rest are not in SNOMED? >>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED >>> (though I don't fully understand what all the symbols mean in the umls >>> browser). >>> >>>> CIN I - Cervical intraepithelial neoplasia 1 >>>> [A3002690/SNOMEDCT/SY/285836003] >>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote: >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >>>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >>>> able to perform the lookup successfully. >>>> Note that CIN II synonyms do exist in other umls thersauses such as >>>> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >>>> >>>> --Pei >>>> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >>>> wrote: >>>>> That is a good question, Ted! >>>>> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm >>>>> not sure if that is a correct context but I was able to duplicate >>>>> your findings. (Finds a CUI for CIN III but not if you change it to >>>>> CIN II) >>>>> >>>>> My first thought was
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Any more guidance you can give about the nature of the changes to the config and impl that would need to be made to get the ICD9 codes? -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Wednesday, September 04, 2013 1:02 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Ted, > On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar> with how to access that information: In the example I've described below, > where would I locate the ICD9 for a specific entity? Even though ICD9 is include in the lookup, IRRC, cTAKES by default is configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or RxNorm code. [1] http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml [2] http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java If you would like it to return ICD9 codes, one would need to modify/configure the above... --Pei On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted wrote: > Thanks for looking into this, it's been puzzling me. > > On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar with how to access that information: In the example I've described > below, where would I locate the ICD9 for a specific entity? > > Thank you > > Ted > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Tuesday, September 03, 2013 7:13 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > You're right, it should have gotten "CIN I"- that's a strange one, > probably needs to be debugged/looked into further... > > On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < > timothy.mil...@childrens.harvard.edu> wrote: > > Ah. So it will get > > CIN 2 (in SNOMED) > > CIN III (in SNOMED) > > CIN 3 (in SNOMED) > > > > but the rest are not in SNOMED? > > > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > > (though I don't fully understand what all the symbols mean in the umls > > browser). > > > >> CIN I - Cervical intraepithelial neoplasia 1 > >> [A3002690/SNOMEDCT/SY/285836003] > > > > > > On 09/03/2013 09:55 PM, Pei Chen wrote: > >> It has the correct parse (POS, chunks, and lookupwindow)- but some of > >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial > >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. > >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was > >> able to perform the lookup successfully. > >> Note that CIN II synonyms do exist in other umls thersauses such as > >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only > >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. > >> > >> --Pei > >> > >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy > >> wrote: > >>> That is a good question, Ted! > >>> > >>> I tried it with a simple context: "The patient has a CIN III." I'm > >>> not sure if that is a correct context but I was able to duplicate > >>> your findings. (Finds a CUI for CIN III but not if you change it to > >>> CIN II) > >>> > >>> My first thought was that it is the chunker. But the chunker seems > >>> to get it right, as CIN II and CIN III are both called NPs, and > >>> similarly the LookupWindowAnnotator handles them both identically. > >>> So that suggests it is a problem with the actual lookup of the > >>> tokens in the LookupWindow. > >>> > >>> That's all I can do for now but maybe someone else who knows more > >>> about its behavior offhand will have an idea. > >>> > >>> Tim > >>> > >>> > >>> > >>> > >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: > >>>> I'm trying to understand what would prevent the > AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems > that are defined in the UMLS version used by cTAKES. > >>>> > >>>> For example, > >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is > parsed out as UMLS CUI C0206708. > >>>> > >>>> CIN comes in 3 grades, 1, 2 and 3.
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
although cTAKES uses ICD9 entries when finding Named Entities, out of the box it doesn't assign ICD9 codes to the named entities, it assigns SNOMED-CT codes. If some text matches an ICD9 term, and the ICD9 term has the same CUI as some SNOMED-CT term(s), the SNOMED-CT code for that SNOMED-CT term(s) is assigned to the annotation (along with the UMLS CUI), even if the SNOMED-CT term and the ICD9 term don't share any words. Hope that helps -- James From: dev-return-1961-Masanz.James=mayo@ctakes.apache.org [dev-return-1961-Masanz.James=mayo@ctakes.apache.org] on behalf of Assur, Ted [theodore.as...@providence.org] Sent: Wednesday, September 04, 2013 10:55 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Thanks for looking into this, it's been puzzling me. On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity? Thank you Ted -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 7:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm >>> not sure if that is a correct context but I was able to duplicate >>> your findings. (Finds a CUI for CIN III but not if you change it to >>> CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems >>> to get it right, as CIN II and CIN III are both called NPs, and >>> similarly the LookupWindowAnnotator handles them both identically. >>> So that suggests it is a problem with the actual lookup of the >>> tokens in the LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more >>> about its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>>> Numerals, I,II, and III. >>>> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>>> "Carcinoma in situ of uterine cervix." >>>> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>>> "Cervical intraepithelial neoplasia grade 2" respectively. >>>> >>>> Is there a way to tune the detection of UMLS concepts? >>>> >>>> >>>> >>>> >>>>
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Ted, > On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar> with how to access that information: In the example I've described below, > where would I locate the ICD9 for a specific entity? Even though ICD9 is include in the lookup, IRRC, cTAKES by default is configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or RxNorm code. [1] http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml [2] http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java If you would like it to return ICD9 codes, one would need to modify/configure the above... --Pei On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted wrote: > Thanks for looking into this, it's been puzzling me. > > On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar with how to access that information: In the example I've described > below, where would I locate the ICD9 for a specific entity? > > Thank you > > Ted > > -Original Message- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Tuesday, September 03, 2013 7:13 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > You're right, it should have gotten "CIN I"- that's a strange one, > probably needs to be debugged/looked into further... > > On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < > timothy.mil...@childrens.harvard.edu> wrote: > > Ah. So it will get > > CIN 2 (in SNOMED) > > CIN III (in SNOMED) > > CIN 3 (in SNOMED) > > > > but the rest are not in SNOMED? > > > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > > (though I don't fully understand what all the symbols mean in the umls > > browser). > > > >> CIN I - Cervical intraepithelial neoplasia 1 > >> [A3002690/SNOMEDCT/SY/285836003] > > > > > > On 09/03/2013 09:55 PM, Pei Chen wrote: > >> It has the correct parse (POS, chunks, and lookupwindow)- but some of > >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial > >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. > >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was > >> able to perform the lookup successfully. > >> Note that CIN II synonyms do exist in other umls thersauses such as > >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only > >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. > >> > >> --Pei > >> > >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy > >> wrote: > >>> That is a good question, Ted! > >>> > >>> I tried it with a simple context: "The patient has a CIN III." I'm > >>> not sure if that is a correct context but I was able to duplicate > >>> your findings. (Finds a CUI for CIN III but not if you change it to > >>> CIN II) > >>> > >>> My first thought was that it is the chunker. But the chunker seems > >>> to get it right, as CIN II and CIN III are both called NPs, and > >>> similarly the LookupWindowAnnotator handles them both identically. > >>> So that suggests it is a problem with the actual lookup of the > >>> tokens in the LookupWindow. > >>> > >>> That's all I can do for now but maybe someone else who knows more > >>> about its behavior offhand will have an idea. > >>> > >>> Tim > >>> > >>> > >>> > >>> > >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: > >>>> I'm trying to understand what would prevent the > AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems > that are defined in the UMLS version used by cTAKES. > >>>> > >>>> For example, > >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is > parsed out as UMLS CUI C0206708. > >>>> > >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with > Roman Numerals, I,II, and III. > >>>> > >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI > C0851140: "Carcinoma in situ of uterine cervix." > >>>> > >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II > a
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
I don't know if this is exactly what you want, but you can use the hyperSql ( http://hsqldb.org/ ) database tool to perform searches on the umls dictionary used by cTakes. For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide all the available terms starting with CIN. In the result you'll see that there is no term "CIN I", and you'll also see that the only listing from ICD9 is for "CIN III" [C0851140, T191, MTHICD9 233.1] If you want an icd9 code that isn't in the cTakes umls dictionary then you can find it online ... but that won't do you much good wrt cTakes. Sean -Original Message- From: Assur, Ted [mailto:theodore.as...@providence.org] Sent: Wednesday, September 04, 2013 11:56 AM To: dev@ctakes.apache.org Subject: RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor Thanks for looking into this, it's been puzzling me. On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity? Thank you Ted -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 7:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm >>> not sure if that is a correct context but I was able to duplicate >>> your findings. (Finds a CUI for CIN III but not if you change it to >>> CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems >>> to get it right, as CIN II and CIN III are both called NPs, and >>> similarly the LookupWindowAnnotator handles them both identically. >>> So that suggests it is a problem with the actual lookup of the >>> tokens in the LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more >>> about its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>>> Numerals, I,II, and III. >>>> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>>> "Carcinoma in situ of uterine cervix." >>>> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>>> "Cervical intraepithelial neoplasia grade 2" respectively. >>>> >>>> Is there a way to tune the detection of UMLS concepts? >>>> >>&
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Thanks for looking into this, it's been puzzling me. On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar with how to access that information: In the example I've described below, where would I locate the ICD9 for a specific entity? Thank you Ted -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 7:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm >>> not sure if that is a correct context but I was able to duplicate >>> your findings. (Finds a CUI for CIN III but not if you change it to >>> CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems >>> to get it right, as CIN II and CIN III are both called NPs, and >>> similarly the LookupWindowAnnotator handles them both identically. >>> So that suggests it is a problem with the actual lookup of the >>> tokens in the LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more >>> about its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>>> Numerals, I,II, and III. >>>> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>>> "Carcinoma in situ of uterine cervix." >>>> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>>> "Cervical intraepithelial neoplasia grade 2" respectively. >>>> >>>> Is there a way to tune the detection of UMLS concepts? >>>> >>>> >>>> >>>> >>>> >>>> Ted Assur >>>> IT Solutions Architect for Cancer Research Providence Health & >>>> Services ted.as...@providence.org >>>> 503-215-6476 >>>> >>>> Crede, ut intelligas. >>>> Intellego, ut credam. >>>> >>>> >>>> >>>> >>>> >>>> >>>> This message is intended for the sole use of the addressee, and may >>>> contain information that is privileged, confidential and exempt from >>>> disclosure under applicable law. If you are not the addressee you are >>>> hereby notified that you may not use, copy, disclose, or distribute to >>>> anyone the message or any information contained in the message. If you >>>> have received this message in error, please immediately advise the sender >>>> by reply email and delete this message. >>>> > This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
This may sound strange, but SNOMED does not contain the term "CIN I". It contains the terms "CIN I - Cervical intraepitheal neoplasia 1" and "CIN I - mild dyskaryosis". -Original Message- From: Pei Chen [mailto:chen...@apache.org] Sent: Tuesday, September 03, 2013 10:13 PM To: dev@ctakes.apache.org Subject: Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm >>> not sure if that is a correct context but I was able to duplicate >>> your findings. (Finds a CUI for CIN III but not if you change it to >>> CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems >>> to get it right, as CIN II and CIN III are both called NPs, and >>> similarly the LookupWindowAnnotator handles them both identically. >>> So that suggests it is a problem with the actual lookup of the >>> tokens in the LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more >>> about its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>> I'm trying to understand what would prevent the >>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific >>>> problems that are defined in the UMLS version used by cTAKES. >>>> >>>> For example, >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed >>>> out as UMLS CUI C0206708. >>>> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>>> Numerals, I,II, and III. >>>> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>>> "Carcinoma in situ of uterine cervix." >>>> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>>> "Cervical intraepithelial neoplasia grade 2" respectively. >>>> >>>> Is there a way to tune the detection of UMLS concepts? >>>> >>>> >>>> >>>> >>>> >>>> Ted Assur >>>> IT Solutions Architect for Cancer Research Providence Health & >>>> Services ted.as...@providence.org >>>> 503-215-6476 >>>> >>>> Crede, ut intelligas. >>>> Intellego, ut credam. >>>> >>>> >>>> >>>> >>>> >>>> >>>> This message is intended for the sole use of the addressee, and may >>>> contain information that is privileged, confidential and exempt from >>>> disclosure under applicable law. If you are not the addressee you are >>>> hereby notified that you may not use, copy, disclose, or distribute to >>>> anyone the message or any information contained in the message. If you >>>> have received this message in error, please immediately advise the sender >>>> by reply email and delete this message. >>>> >
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
You're right, it should have gotten "CIN I"- that's a strange one, probably needs to be debugged/looked into further... On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy wrote: > Ah. So it will get > CIN 2 (in SNOMED) > CIN III (in SNOMED) > CIN 3 (in SNOMED) > > but the rest are not in SNOMED? > > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED > (though I don't fully understand what all the symbols mean in the umls > browser). > >> CIN I - Cervical intraepithelial neoplasia 1 >> [A3002690/SNOMEDCT/SY/285836003] > > > On 09/03/2013 09:55 PM, Pei Chen wrote: >> It has the correct parse (POS, chunks, and lookupwindow)- but some of >> the terms do not exist in SNOMED- >> CIN 2 - Cervical intraepithelial neoplasia 2 >> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was >> able to perform the lookup successfully. >> Note that CIN II synonyms do exist in other umls thersauses such as >> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >> >> --Pei >> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >> wrote: >>> That is a good question, Ted! >>> >>> I tried it with a simple context: "The patient has a CIN III." I'm not >>> sure if that is a correct context but I was able to duplicate your >>> findings. (Finds a CUI for CIN III but not if you change it to CIN II) >>> >>> My first thought was that it is the chunker. But the chunker seems to >>> get it right, as CIN II and CIN III are both called NPs, and similarly >>> the LookupWindowAnnotator handles them both identically. So that >>> suggests it is a problem with the actual lookup of the tokens in the >>> LookupWindow. >>> >>> That's all I can do for now but maybe someone else who knows more about >>> its behavior offhand will have an idea. >>> >>> Tim >>> >>> >>> >>> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote: I'm trying to understand what would prevent the AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems that are defined in the UMLS version used by cTAKES. For example, CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out as UMLS CUI C0206708. CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman Numerals, I,II, and III. cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: "Carcinoma in situ of uterine cervix." However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical intraepithelial neoplasia grade 2" respectively. Is there a way to tune the detection of UMLS concepts? Ted Assur IT Solutions Architect for Cancer Research Providence Health & Services ted.as...@providence.org 503-215-6476 Crede, ut intelligas. Intellego, ut credam. This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message. >
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Ah. So it will get CIN 2 (in SNOMED) CIN III (in SNOMED) CIN 3 (in SNOMED) but the rest are not in SNOMED? I wonder why it doesn't get CIN I? It looks like that exists in SNOMED (though I don't fully understand what all the symbols mean in the umls browser). > CIN I - Cervical intraepithelial neoplasia 1 > [A3002690/SNOMEDCT/SY/285836003] On 09/03/2013 09:55 PM, Pei Chen wrote: > It has the correct parse (POS, chunks, and lookupwindow)- but some of > the terms do not exist in SNOMED- > CIN 2 - Cervical intraepithelial neoplasia 2 > [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. > CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was > able to perform the lookup successfully. > Note that CIN II synonyms do exist in other umls thersauses such as > MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only > contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. > > --Pei > > On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy > wrote: >> That is a good question, Ted! >> >> I tried it with a simple context: "The patient has a CIN III." I'm not >> sure if that is a correct context but I was able to duplicate your >> findings. (Finds a CUI for CIN III but not if you change it to CIN II) >> >> My first thought was that it is the chunker. But the chunker seems to >> get it right, as CIN II and CIN III are both called NPs, and similarly >> the LookupWindowAnnotator handles them both identically. So that >> suggests it is a problem with the actual lookup of the tokens in the >> LookupWindow. >> >> That's all I can do for now but maybe someone else who knows more about >> its behavior offhand will have an idea. >> >> Tim >> >> >> >> >> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>> I'm trying to understand what would prevent the >>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems >>> that are defined in the UMLS version used by cTAKES. >>> >>> For example, >>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out >>> as UMLS CUI C0206708. >>> >>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >>> Numerals, I,II, and III. >>> >>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >>> "Carcinoma in situ of uterine cervix." >>> >>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >>> "Cervical intraepithelial neoplasia grade 2" respectively. >>> >>> Is there a way to tune the detection of UMLS concepts? >>> >>> >>> >>> >>> >>> Ted Assur >>> IT Solutions Architect for Cancer Research >>> Providence Health & Services >>> ted.as...@providence.org >>> 503-215-6476 >>> >>> Crede, ut intelligas. >>> Intellego, ut credam. >>> >>> >>> >>> >>> >>> >>> This message is intended for the sole use of the addressee, and may contain >>> information that is privileged, confidential and exempt from disclosure >>> under applicable law. If you are not the addressee you are hereby notified >>> that you may not use, copy, disclose, or distribute to anyone the message >>> or any information contained in the message. If you have received this >>> message in error, please immediately advise the sender by reply email and >>> delete this message. >>>
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
It has the correct parse (POS, chunks, and lookupwindow)- but some of the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was able to perform the lookup successfully. Note that CIN II synonyms do exist in other umls thersauses such as MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. --Pei On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy wrote: > That is a good question, Ted! > > I tried it with a simple context: "The patient has a CIN III." I'm not > sure if that is a correct context but I was able to duplicate your > findings. (Finds a CUI for CIN III but not if you change it to CIN II) > > My first thought was that it is the chunker. But the chunker seems to > get it right, as CIN II and CIN III are both called NPs, and similarly > the LookupWindowAnnotator handles them both identically. So that > suggests it is a problem with the actual lookup of the tokens in the > LookupWindow. > > That's all I can do for now but maybe someone else who knows more about > its behavior offhand will have an idea. > > Tim > > > > > On 09/03/2013 08:24 PM, Assur, Ted wrote: >> I'm trying to understand what would prevent the >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems >> that are defined in the UMLS version used by cTAKES. >> >> For example, >> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out >> as UMLS CUI C0206708. >> >> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman >> Numerals, I,II, and III. >> >> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: >> "Carcinoma in situ of uterine cervix." >> >> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as >> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >> "Cervical intraepithelial neoplasia grade 2" respectively. >> >> Is there a way to tune the detection of UMLS concepts? >> >> >> >> >> >> Ted Assur >> IT Solutions Architect for Cancer Research >> Providence Health & Services >> ted.as...@providence.org >> 503-215-6476 >> >> Crede, ut intelligas. >> Intellego, ut credam. >> >> >> >> >> >> >> This message is intended for the sole use of the addressee, and may contain >> information that is privileged, confidential and exempt from disclosure >> under applicable law. If you are not the addressee you are hereby notified >> that you may not use, copy, disclose, or distribute to anyone the message or >> any information contained in the message. If you have received this message >> in error, please immediately advise the sender by reply email and delete >> this message. >> >
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
That is a good question, Ted! I tried it with a simple context: "The patient has a CIN III." I'm not sure if that is a correct context but I was able to duplicate your findings. (Finds a CUI for CIN III but not if you change it to CIN II) My first thought was that it is the chunker. But the chunker seems to get it right, as CIN II and CIN III are both called NPs, and similarly the LookupWindowAnnotator handles them both identically. So that suggests it is a problem with the actual lookup of the tokens in the LookupWindow. That's all I can do for now but maybe someone else who knows more about its behavior offhand will have an idea. Tim On 09/03/2013 08:24 PM, Assur, Ted wrote: > I'm trying to understand what would prevent the > AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems > that are defined in the UMLS version used by cTAKES. > > For example, > CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out > as UMLS CUI C0206708. > > CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman > Numerals, I,II, and III. > > cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: > "Carcinoma in situ of uterine cervix." > > However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their > correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical > intraepithelial neoplasia grade 2" respectively. > > Is there a way to tune the detection of UMLS concepts? > > > > > > Ted Assur > IT Solutions Architect for Cancer Research > Providence Health & Services > ted.as...@providence.org > 503-215-6476 > > Crede, ut intelligas. > Intellego, ut credam. > > > > > > > This message is intended for the sole use of the addressee, and may contain > information that is privileged, confidential and exempt from disclosure under > applicable law. If you are not the addressee you are hereby notified that you > may not use, copy, disclose, or distribute to anyone the message or any > information contained in the message. If you have received this message in > error, please immediately advise the sender by reply email and delete this > message. >
Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Hi Ted, Detecting the stage/grade and other attributes and asserting those relationships to the cancer aside (That's probably a separate discussion)- But in your example, since there are distinct SNOMEDCT concepts and direct matches, it was able to identify "Cervical intraepithelial neoplasia grade 1" cui = "C0349458" code = "285836003" as well as "Cervical intraepithelial neoplasia" cui = "C0206708" code = "285636001" ,etc. It should also be able to identify "CIN 2" as there should be an exact match in SNOMEDCT: (CIN 2 - Cervical intraepithelial neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] Please see attached xml output. I am using out of the box AggregatePlaintextUMLSProcessor from the 3.1RC3 --Pei On Tue, Sep 3, 2013 at 8:24 PM, Assur, Ted wrote: > I'm trying to understand what would prevent the > AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems > that are defined in the UMLS version used by cTAKES. > > For example, > CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out > as UMLS CUI C0206708. > > CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman > Numerals, I,II, and III. > > cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: > "Carcinoma in situ of uterine cervix." > > However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their > correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical > intraepithelial neoplasia grade 2" respectively. > > Is there a way to tune the detection of UMLS concepts? > > > > > > Ted Assur > IT Solutions Architect for Cancer Research > Providence Health & Services > ted.as...@providence.org > 503-215-6476 > > Crede, ut intelligas. > Intellego, ut credam. > > > > > > > This message is intended for the sole use of the addressee, and may contain > information that is privileged, confidential and exempt from disclosure under > applicable law. If you are not the addressee you are hereby notified that you > may not use, copy, disclose, or distribute to anyone the message or any > information contained in the message. If you have received this message in > error, please immediately advise the sender by reply email and delete this > message. 563 556 492 513 506 499 452 372 393 358 323 337 351 386 365 400 330 316 379 344 272 279 265 286 217 224 257 278 264 271 217