RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-11-04 Thread Finan, Sean
Hi Ted,

In addition to performing searches, 
>  the hyperSql ( http://hsqldb.org/ ) database tool
should allow you to perform inserts into the umls dictionary database used by 
cTakes.

You can also create your own customized dictionary and run cTakes using only 
that dictionary or with umls plus that dictionary.  There are several ways to 
create a custom dictionary, and I think that you can start by looking in the 
resources/ ... /dictionary/lookup/ directory for examples.  It can be a little 
overwhelming if you just want to add one or two terms, and I am in the process 
of trying to make this a little easier for any user.  It may be a while before 
I can add my work to the trunk.   Until then, if you decide to go with the csv 
approach you can probably make it through with the examples in cTakes 
resources.  If you want to create a new hsql database then I can send you my 
(old) instructions on that process - but it might be overkill.

If you really want to know what lies behind the mask of the cTakes umls 
dictionary then I highly recommend that you just interface with it directly 
using the hsql tool.

Sean


From: Assur, Ted [theodore.as...@providence.org]
Sent: Friday, November 01, 2013 5:36 PM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

OK, Kind of resurfacing the original topic on this one, after I redirected it 
towards ICD codes last month:

I have several examples, like the one below, where it would be very helpful to 
be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" 
(CUI = C0349458).

So if I have particular UMLS concepts I want to make sure and include, is there 
a way for me to *add* them to the umls dictionary used by cTAKES?

Ted


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 04, 2013 9:37 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

I don't know if this is exactly what you want, but you can use the hyperSql ( 
http://hsqldb.org/ ) database tool to perform searches on the umls dictionary 
used by cTakes.
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide 
all the available terms starting with CIN.  In the result you'll see that there 
is no term "CIN I", and you'll also see that the only listing from ICD9 is for 
"CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can 
find it online ... but that won't do you much good wrt cTakes.

Sean

-Original Message-
From: Assur, Ted [mailto:theodore.as...@providence.org]
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar 
with how to access that information: In the example I've described below, where 
would I locate the ICD9 for a specific entity?

Thank you

Ted

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried 

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-11-01 Thread Assur, Ted
OK, Kind of resurfacing the original topic on this one, after I redirected it 
towards ICD codes last month:

I have several examples, like the one below, where it would be very helpful to 
be able to include UMLS terms that are in the UMLS 2011AB release, e.g. "CIN 1" 
(CUI = C0349458).

So if I have particular UMLS concepts I want to make sure and include, is there 
a way for me to *add* them to the umls dictionary used by cTAKES?

Ted


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 04, 2013 9:37 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

I don't know if this is exactly what you want, but you can use the hyperSql ( 
http://hsqldb.org/ ) database tool to perform searches on the umls dictionary 
used by cTakes.
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide 
all the available terms starting with CIN.  In the result you'll see that there 
is no term "CIN I", and you'll also see that the only listing from ICD9 is for 
"CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can 
find it online ... but that won't do you much good wrt cTakes.

Sean

-Original Message-
From: Assur, Ted [mailto:theodore.as...@providence.org]
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar 
with how to access that information: In the example I've described below, where 
would I locate the ICD9 for a specific entity?

Thank you

Ted

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the 
>>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
>>>> problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
>>>> out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-30 Thread Chen, Pei
Hi James,
Glad you were able to make cTAKES work for your use case.  

The UMLS subset that is currently included in the resources should be:
*   International Classification of Diseases, Ninth Revision, Clinical 
Modification, 2012   ICD9CM_2012 ICD9CM  ENG 0   20997
*   International Classification of Diseases, Ninth Revision, Clinical 
Modification, Metathesaurus additional entry terms, 2012 MTHICD9_2012
ICD9CM  ENG 0   16304
*   Medical Subject Headings, 2012_2011_09_09   MSH2012_2011_09_09  
MSH ENG 0   321367
*   NCI Thesaurus, 2011_02D NCI2011_02D NCI ENG 0   90135
*   SNOMED Clinical Terms, 2011_07_31   SNOMEDCT_2011_07_31 
SNOMEDCTENG 9   324494

And also RxNorm for the rxnorm_index folder.
(I think there was a readme about it, if not, let's at least add it to the User 
FAQ's?)

--Pei

> -Original Message-
> From: Vogel, James [mailto:jvo...@activehealth.net]
> Sent: Monday, September 30, 2013 11:41 AM
> To: dev@ctakes.apache.org
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> That worked and I see how I can change the code to do both SNOMED and
> ICD9.
> I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON
> umls_ms_2011ab (cui);  I needed to change the database from 'read-only', is
> that going to cause any other problems?
> 
> What subset of ICD9 is in the dictionary?
> 
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Friday, September 27, 2013 11:26 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> James,
> Obviously it would be best to customize the code and/or the dictionary for
> your particular case.
> But if you want to try something that will work without any code changes,
> you can try the below in your LookupDesc_Db.xml Essentially, what it will do
> is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will
> allow you to specify an SQL statement that maps the CUI's to Codes.  Couple
> by the fact that there already is a table called umls_ms_2011ab which
> contains the codes and cui's from many different sources including ICD9CM.
> What you could do is just reuse the table as the mapping table as well and
> specify the source such as:
> select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'
> 
> (The downside is that I don't think there is a index on sourcetype so
> performance may suck).
> I've attached an example to normalize to ICD9CM codes instead of
> SNOMEDCT.
>  className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbCons
> umerImpl">
> 
>   key="cuiMetaField" value="cui"/>  value="tui"/>  value="T021,T022,T023,T024,T025,T026,T029,T030"/>
>   key="disorderTuis"
> value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>
>  value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>
>   key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=?
> and sourcetype='ICD9CM'"/>  
> 
> On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen
> mailto:chen...@apache.org>> wrote:
> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
> 
> --Pei
> 
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James
> mailto:jvo...@activehealth.net>> wrote:
> I now see that I use a query on umls_ms_2011ab where sourcetype =
> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
> ICD9CM annotations / concepts or do I change the code in consumeHits() or
> getSnomedCodes()?
> 
> -Original Message-
> From: Vogel, James
> Sent: Friday, September 27, 2013 6:30 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> Is anyone able to provide any more detailed guidance on what I'd need to
> change to add the ICD9 codes as tags, e.g., where do I look for the tables in
> the hsql database that would contain the ICD9 data?
> 
> Thanks.
> 
> -Original Message-
> From: Miller, Timothy
> [mailt

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-30 Thread Vogel, James
That worked and I see how I can change the code to do both SNOMED and ICD9.
I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON umls_ms_2011ab 
(cui);  I needed to change the database from 'read-only', is that going to 
cause any other problems?

What subset of ICD9 is in the dictionary?

From: Pei Chen [mailto:chen...@apache.org]
Sent: Friday, September 27, 2013 11:26 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

James,
Obviously it would be best to customize the code and/or the dictionary for your 
particular case.
But if you want to try something that will work without any code changes, you 
can try the below in your LookupDesc_Db.xml
Essentially, what it will do is take advantage of the fact the the 
UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that maps 
the CUI's to Codes.  Couple by the fact that there already is a table called 
umls_ms_2011ab which contains the codes and cui's from many different sources 
including ICD9CM.
What you could do is just reuse the table as the mapping table as well and 
specify the source such as:
select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'

(The downside is that I don't think there is a index on sourcetype so 
performance may suck).
I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT.














On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen 
mailto:chen...@apache.org>> wrote:
James,
One can try the NamedEntityLookupConsumerImpl instead of 
UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only 
contain SNOMED codes.
Will you need to preserve the TUI?  One thing is that 
NamedEntityLookupConsumerImpl will return back all of the hits, except that 
it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps we 
should make the NamedEntityLookupConsumerImpl a bit more general.

--Pei

On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James 
mailto:jvo...@activehealth.net>> wrote:
I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'.  Is 
there a way to use an existing AE or class to add additional ICD9CM annotations 
/ concepts or do I change the code in consumeHits() or getSnomedCodes()?

-Original Message-
From: Vogel, James
Sent: Friday, September 27, 2013 6:30 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Is anyone able to provide any more detailed guidance on what I'd need to change 
to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql 
database that would contain the ICD9 data?

Thanks.

-Original Message-
From: Miller, Timothy 
[mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the  tag with the idRef = DICT_UMLS_MS. Then look under
the  section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config 
> and impl that would need to be made to get the ICD9 codes?
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org<mailto:chen...@apache.org>]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: Re: specificity in selecting EntityMentions when using 
> AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctak

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-27 Thread Pei Chen
James,
Obviously it would be best to customize the code and/or the dictionary for
your particular case.
But if you want to try something that will work without any code changes,
you can try the below in your LookupDesc_Db.xml
Essentially, what it will do is take advantage of the fact the the
UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that
maps the CUI's to Codes.  Couple by the fact that there already is a table
called umls_ms_2011ab which contains the codes and cui's from many
different sources including ICD9CM.
What you could do is just reuse the table as the mapping table as well and
specify the source such as:
select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'

(The downside is that I don't think there is a index on sourcetype so
performance may suck).
I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT.



  

  

  

  

  

  

  

  

  

  

  

 


On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen  wrote:

> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
>
> --Pei
>
>
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James wrote:
>
>> I now see that I use a query on umls_ms_2011ab where sourcetype =
>> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
>> ICD9CM annotations / concepts or do I change the code in consumeHits() or
>> getSnomedCodes()?
>>
>> -----Original Message-
>> From: Vogel, James
>> Sent: Friday, September 27, 2013 6:30 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> Is anyone able to provide any more detailed guidance on what I'd need to
>> change to add the ICD9 codes as tags, e.g., where do I look for the tables
>> in the hsql database that would contain the ICD9 data?
>>
>> Thanks.
>>
>> -Original Message-
>> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>> Sent: Monday, September 16, 2013 7:25 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> James,
>> I haven't done it myself, so I don't know exactly how the config
>> changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
>> the  tag with the idRef = DICT_UMLS_MS. Then look under
>> the  section, and you'll see the codingScheme is SNOMED.
>> I believe this is where the actual dictionary filtering is done. There
>> is also a consumer class called
>> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
>> mapPrepStmt field with a SQL query that might need changing. That is
>> where I would start looking, I'm not sure whether you would need to
>> write a new consumer class, and what values the codingScheme field can
>> take, but hopefully this helps you get started until someone else chimes
>> in with more detailed info!
>>
>> Tim
>>
>> On 09/15/2013 08:39 PM, Vogel, James wrote:
>> > Any more guidance you can give about the nature of the changes to the
>> config and impl that would need to be made to get the ICD9 codes?
>> >
>> > -Original Message-
>> > From: Pei Chen [mailto:chen...@apache.org]
>> > Sent: Wednesday, September 04, 2013 1:02 PM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>> >
>> > Ted,
>> >
>> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> > familiar> with how to access that information: In the example I've
>> > described below,
>> >
>> >> where would I locate the ICD9 for a specific entity?
>> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
>> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT
>> code or
>> > RxNorm code.
>> >
>> > [1]
>> >
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>> >
>> > [2]
>> 

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-27 Thread Pei Chen
James,
One can try the NamedEntityLookupConsumerImpl instead of
UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
contain SNOMED codes.
Will you need to preserve the TUI?  One thing is that
NamedEntityLookupConsumerImpl will return back all of the hits, except that
it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
we should make the NamedEntityLookupConsumerImpl a bit more general.

--Pei


On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James wrote:

> I now see that I use a query on umls_ms_2011ab where sourcetype =
> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
> ICD9CM annotations / concepts or do I change the code in consumeHits() or
> getSnomedCodes()?
>
> -Original Message-
> From: Vogel, James
> Sent: Friday, September 27, 2013 6:30 PM
> To: dev@ctakes.apache.org
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> Is anyone able to provide any more detailed guidance on what I'd need to
> change to add the ICD9 codes as tags, e.g., where do I look for the tables
> in the hsql database that would contain the ICD9 data?
>
> Thanks.
>
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Monday, September 16, 2013 7:25 AM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> James,
> I haven't done it myself, so I don't know exactly how the config
> changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
> the  tag with the idRef = DICT_UMLS_MS. Then look under
> the  section, and you'll see the codingScheme is SNOMED.
> I believe this is where the actual dictionary filtering is done. There
> is also a consumer class called
> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
> mapPrepStmt field with a SQL query that might need changing. That is
> where I would start looking, I'm not sure whether you would need to
> write a new consumer class, and what values the codingScheme field can
> take, but hopefully this helps you get started until someone else chimes
> in with more detailed info!
>
> Tim
>
> On 09/15/2013 08:39 PM, Vogel, James wrote:
> > Any more guidance you can give about the nature of the changes to the
> config and impl that would need to be made to get the ICD9 codes?
> >
> > -Original Message-----
> > From: Pei Chen [mailto:chen...@apache.org]
> > Sent: Wednesday, September 04, 2013 1:02 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> >
> > Ted,
> >
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> > familiar> with how to access that information: In the example I've
> > described below,
> >
> >> where would I locate the ICD9 for a specific entity?
> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code
> or
> > RxNorm code.
> >
> > [1]
> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
> >
> > [2]
> >
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
> >
> >  If you would like it to return ICD9 codes, one would need to
> > modify/configure the above...
> >
> > --Pei
> >
> >
> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> > wrote:
> >
> >> Thanks for looking into this, it's been puzzling me.
> >>
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> >> familiar with how to access that information: In the example I've
> described
> >> below, where would I locate the ICD9 for a specific entity?
> >>
> >> Thank you
> >>
> >> Ted
> >>
> >> -Original Message-
> >> From: Pei Chen [mailto:chen...@apache.org]
> >> Sent: Tuesday, September 03, 2013 7:13 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: specificity in selecting EntityMentions when using
> >> AggregatePlaintextUMLSProcessor
> >>
> >> You're right, it should have gotten "CIN I"- that's a strange one,
> >> probably needs to be debugged/looked into further...
> >>
> 

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-27 Thread Vogel, James
I now see that I use a query on umls_ms_2011ab where sourcetype = 'ICD9CM'.  Is 
there a way to use an existing AE or class to add additional ICD9CM annotations 
/ concepts or do I change the code in consumeHits() or getSnomedCodes()?

-Original Message-
From: Vogel, James
Sent: Friday, September 27, 2013 6:30 PM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Is anyone able to provide any more detailed guidance on what I'd need to change 
to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql 
database that would contain the ICD9 data?

Thanks.

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the  tag with the idRef = DICT_UMLS_MS. Then look under
the  section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config 
> and impl that would need to be made to get the ICD9 codes?
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using 
> AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -Original Message-
>> From: Pei Chen [mailto:chen...@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> timothy.mil...@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A965/SNO

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-27 Thread Vogel, James
Is anyone able to provide any more detailed guidance on what I'd need to change 
to add the ICD9 codes as tags, e.g., where do I look for the tables in the hsql 
database that would contain the ICD9 data?

Thanks.

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Monday, September 16, 2013 7:25 AM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the  tag with the idRef = DICT_UMLS_MS. Then look under
the  section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config 
> and impl that would need to be made to get the ICD9 codes?
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using 
> AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -Original Message-
>> From: Pei Chen [mailto:chen...@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> timothy.mil...@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Mi

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-16 Thread Miller, Timothy
James,
I haven't done it myself, so I don't know exactly how the config
changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
the  tag with the idRef = DICT_UMLS_MS. Then look under
the  section, and you'll see the codingScheme is SNOMED.
I believe this is where the actual dictionary filtering is done. There
is also a consumer class called
org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
mapPrepStmt field with a SQL query that might need changing. That is
where I would start looking, I'm not sure whether you would need to
write a new consumer class, and what values the codingScheme field can
take, but hopefully this helps you get started until someone else chimes
in with more detailed info!

Tim

On 09/15/2013 08:39 PM, Vogel, James wrote:
> Any more guidance you can give about the nature of the changes to the config 
> and impl that would need to be made to get the ICD9 codes?
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Wednesday, September 04, 2013 1:02 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using 
> AggregatePlaintextUMLSProcessor
>
> Ted,
>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar> with how to access that information: In the example I've
> described below,
>
>> where would I locate the ICD9 for a specific entity?
> Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
> RxNorm code.
>
> [1]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>
> [2]
> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>
>  If you would like it to return ICD9 codes, one would need to
> modify/configure the above...
>
> --Pei
>
>
> On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> wrote:
>
>> Thanks for looking into this, it's been puzzling me.
>>
>> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> familiar with how to access that information: In the example I've described
>> below, where would I locate the ICD9 for a specific entity?
>>
>> Thank you
>>
>> Ted
>>
>> -Original Message-
>> From: Pei Chen [mailto:chen...@apache.org]
>> Sent: Tuesday, September 03, 2013 7:13 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> You're right, it should have gotten "CIN I"- that's a strange one,
>> probably needs to be debugged/looked into further...
>>
>> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> timothy.mil...@childrens.harvard.edu> wrote:
>>> Ah. So it will get
>>> CIN 2 (in SNOMED)
>>> CIN III (in SNOMED)
>>> CIN 3 (in SNOMED)
>>>
>>> but the rest are not in SNOMED?
>>>
>>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>>> (though I don't fully understand what all the symbols mean in the umls
>>> browser).
>>>
>>>> CIN I - Cervical intraepithelial neoplasia 1
>>>> [A3002690/SNOMEDCT/SY/285836003]
>>>
>>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>>>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>>>> able to perform the lookup successfully.
>>>> Note that CIN II synonyms do exist in other umls thersauses such as
>>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>>>
>>>> --Pei
>>>>
>>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>>>  wrote:
>>>>> That is a good question, Ted!
>>>>>
>>>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>>>> not sure if that is a correct context but I was able to duplicate
>>>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>>>> CIN II)
>>>>>
>>>>> My first thought was

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-15 Thread Vogel, James
Any more guidance you can give about the nature of the changes to the config 
and impl that would need to be made to get the ICD9 codes?

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Wednesday, September 04, 2013 1:02 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Ted,

> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
familiar> with how to access that information: In the example I've
described below,

> where would I locate the ICD9 for a specific entity?

Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
RxNorm code.

[1]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml

[2]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java

 If you would like it to return ICD9 codes, one would need to
modify/configure the above...

--Pei


On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
wrote:

> Thanks for looking into this, it's been puzzling me.
>
> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar with how to access that information: In the example I've described
> below, where would I locate the ICD9 for a specific entity?
>
> Thank you
>
> Ted
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Tuesday, September 03, 2013 7:13 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> You're right, it should have gotten "CIN I"- that's a strange one,
> probably needs to be debugged/looked into further...
>
> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> > Ah. So it will get
> > CIN 2 (in SNOMED)
> > CIN III (in SNOMED)
> > CIN 3 (in SNOMED)
> >
> > but the rest are not in SNOMED?
> >
> > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> > (though I don't fully understand what all the symbols mean in the umls
> > browser).
> >
> >> CIN I - Cervical intraepithelial neoplasia 1
> >> [A3002690/SNOMEDCT/SY/285836003]
> >
> >
> > On 09/03/2013 09:55 PM, Pei Chen wrote:
> >> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
> >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
> >> able to perform the lookup successfully.
> >> Note that CIN II synonyms do exist in other umls thersauses such as
> >> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>
> >> --Pei
> >>
> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >>  wrote:
> >>> That is a good question, Ted!
> >>>
> >>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>> not sure if that is a correct context but I was able to duplicate
> >>> your findings. (Finds a CUI for CIN III but not if you change it to
> >>> CIN II)
> >>>
> >>> My first thought was that it is the chunker. But the chunker seems
> >>> to get it right, as CIN II and CIN III are both called NPs, and
> >>> similarly the LookupWindowAnnotator handles them both identically.
> >>> So that suggests it is a problem with the actual lookup of the
> >>> tokens in the LookupWindow.
> >>>
> >>> That's all I can do for now but maybe someone else who knows more
> >>> about its behavior offhand will have an idea.
> >>>
> >>> Tim
> >>>
> >>>
> >>>
> >>>
> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>> I'm trying to understand what would prevent the
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
> that are defined in the UMLS version used by cTAKES.
> >>>>
> >>>> For example,
> >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> parsed out as UMLS CUI C0206708.
> >>>>
> >>>> CIN comes in 3 grades, 1, 2 and 3.

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-04 Thread Masanz, James J.
although cTAKES uses ICD9 entries when finding Named Entities, out of the box 
it doesn't assign ICD9 codes to the named entities, it assigns SNOMED-CT codes.
If some text matches an ICD9 term, and the ICD9 term has the same CUI as some 
SNOMED-CT term(s), the SNOMED-CT code for that SNOMED-CT term(s) is assigned to 
the annotation (along with the UMLS CUI), even if the SNOMED-CT term and the 
ICD9 term don't share any words.

Hope that helps

-- James





From: dev-return-1961-Masanz.James=mayo@ctakes.apache.org 
[dev-return-1961-Masanz.James=mayo@ctakes.apache.org] on behalf of Assur, 
Ted [theodore.as...@providence.org]
Sent: Wednesday, September 04, 2013 10:55 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar 
with how to access that information: In the example I've described below, where 
would I locate the ICD9 for a specific entity?

Thank you

Ted

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the 
>>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
>>>> problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
>>>> out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>>>> Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>>>> "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>>>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> 

Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-04 Thread Pei Chen
Ted,

> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
familiar> with how to access that information: In the example I've
described below,

> where would I locate the ICD9 for a specific entity?

Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or
RxNorm code.

[1]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml

[2]
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java

 If you would like it to return ICD9 codes, one would need to
modify/configure the above...

--Pei


On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
wrote:

> Thanks for looking into this, it's been puzzling me.
>
> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> familiar with how to access that information: In the example I've described
> below, where would I locate the ICD9 for a specific entity?
>
> Thank you
>
> Ted
>
> -Original Message-
> From: Pei Chen [mailto:chen...@apache.org]
> Sent: Tuesday, September 03, 2013 7:13 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
>
> You're right, it should have gotten "CIN I"- that's a strange one,
> probably needs to be debugged/looked into further...
>
> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> > Ah. So it will get
> > CIN 2 (in SNOMED)
> > CIN III (in SNOMED)
> > CIN 3 (in SNOMED)
> >
> > but the rest are not in SNOMED?
> >
> > I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> > (though I don't fully understand what all the symbols mean in the umls
> > browser).
> >
> >> CIN I - Cervical intraepithelial neoplasia 1
> >> [A3002690/SNOMEDCT/SY/285836003]
> >
> >
> > On 09/03/2013 09:55 PM, Pei Chen wrote:
> >> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> >> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
> >> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> >> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
> >> able to perform the lookup successfully.
> >> Note that CIN II synonyms do exist in other umls thersauses such as
> >> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> >> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>
> >> --Pei
> >>
> >> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >>  wrote:
> >>> That is a good question, Ted!
> >>>
> >>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>> not sure if that is a correct context but I was able to duplicate
> >>> your findings. (Finds a CUI for CIN III but not if you change it to
> >>> CIN II)
> >>>
> >>> My first thought was that it is the chunker. But the chunker seems
> >>> to get it right, as CIN II and CIN III are both called NPs, and
> >>> similarly the LookupWindowAnnotator handles them both identically.
> >>> So that suggests it is a problem with the actual lookup of the
> >>> tokens in the LookupWindow.
> >>>
> >>> That's all I can do for now but maybe someone else who knows more
> >>> about its behavior offhand will have an idea.
> >>>
> >>> Tim
> >>>
> >>>
> >>>
> >>>
> >>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>> I'm trying to understand what would prevent the
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems
> that are defined in the UMLS version used by cTAKES.
> >>>>
> >>>> For example,
> >>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is
> parsed out as UMLS CUI C0206708.
> >>>>
> >>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with
> Roman Numerals, I,II, and III.
> >>>>
> >>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI
> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>
> >>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II
> a

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-04 Thread Finan, Sean
I don't know if this is exactly what you want, but you can use the hyperSql ( 
http://hsqldb.org/ ) database tool to perform searches on the umls dictionary 
used by cTakes.  
For instance " select * from UMLS_MS_2011AB where FWORD = 'CIN' " will provide 
all the available terms starting with CIN.  In the result you'll see that there 
is no term "CIN I", and you'll also see that the only listing from ICD9 is for 
"CIN III" [C0851140, T191, MTHICD9 233.1]

If you want an icd9 code that isn't in the cTakes umls dictionary then you can 
find it online ... but that won't do you much good wrt cTakes.

Sean

-Original Message-
From: Assur, Ted [mailto:theodore.as...@providence.org] 
Sent: Wednesday, September 04, 2013 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar 
with how to access that information: In the example I've described below, where 
would I locate the ICD9 for a specific entity?

Thank you

Ted

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED 
> (though I don't fully understand what all the symbols mean in the umls 
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1 
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of 
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial 
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was 
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as 
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only 
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy 
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm 
>>> not sure if that is a correct context but I was able to duplicate 
>>> your findings. (Finds a CUI for CIN III but not if you change it to 
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems 
>>> to get it right, as CIN II and CIN III are both called NPs, and 
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the 
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more 
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the 
>>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
>>>> problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
>>>> out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>>>> Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>>>> "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>>>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>&

RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-04 Thread Assur, Ted
Thanks for looking into this, it's been puzzling me.

On another note, I know the cTAKES dictionary uses ICD9, but I'm not familiar 
with how to access that information: In the example I've described below, where 
would I locate the ICD9 for a specific entity?

Thank you

Ted

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Tuesday, September 03, 2013 7:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm
>>> not sure if that is a correct context but I was able to duplicate
>>> your findings. (Finds a CUI for CIN III but not if you change it to
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems
>>> to get it right, as CIN II and CIN III are both called NPs, and
>>> similarly the LookupWindowAnnotator handles them both identically.
>>> So that suggests it is a problem with the actual lookup of the
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the 
>>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
>>>> problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
>>>> out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>>>> Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>>>> "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>>>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> 
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health &
>>>> Services ted.as...@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   
>>>>
>>>> This message is intended for the sole use of the addressee, and may 
>>>> contain information that is privileged, confidential and exempt from 
>>>> disclosure under applicable law. If you are not the addressee you are 
>>>> hereby notified that you may not use, copy, disclose, or distribute to 
>>>> anyone the message or any information contained in the message. If you 
>>>> have received this message in error, please immediately advise the sender 
>>>> by reply email and delete this message.
>>>>
>




This message is intended for the sole use of the addressee, and may contain 
information that is privileged, confidential and exempt from disclosure under 
applicable law. If you are not the addressee you are hereby notified that you 
may not use, copy, disclose, or distribute to anyone the message or any 
information contained in the message. If you have received this message in 
error, please immediately advise the sender by reply email and delete this 
message.



RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-04 Thread Finan, Sean
This may sound strange, but SNOMED does not contain the term "CIN I".  It 
contains the terms "CIN I - Cervical intraepitheal neoplasia 1" and "CIN I - 
mild dyskaryosis".  

-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Tuesday, September 03, 2013 10:13 PM
To: dev@ctakes.apache.org
Subject: Re: specificity in selecting EntityMentions when using 
AggregatePlaintextUMLSProcessor

You're right, it should have gotten "CIN I"- that's a strange one, probably 
needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy 
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED 
> (though I don't fully understand what all the symbols mean in the umls 
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1 
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of 
>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial 
>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was 
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as 
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only 
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy 
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm 
>>> not sure if that is a correct context but I was able to duplicate 
>>> your findings. (Finds a CUI for CIN III but not if you change it to 
>>> CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems 
>>> to get it right, as CIN II and CIN III are both called NPs, and 
>>> similarly the LookupWindowAnnotator handles them both identically. 
>>> So that suggests it is a problem with the actual lookup of the 
>>> tokens in the LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more 
>>> about its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>>> I'm trying to understand what would prevent the 
>>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
>>>> problems that are defined in the UMLS version used by cTAKES.
>>>>
>>>> For example,
>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
>>>> out as UMLS CUI C0206708.
>>>>
>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>>>> Numerals, I,II, and III.
>>>>
>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>>>> "Carcinoma in situ of uterine cervix."
>>>>
>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>>>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>>
>>>> Is there a way to tune the detection of UMLS concepts?
>>>>
>>>>
>>>>
>>>>
>>>> 
>>>> Ted Assur
>>>> IT Solutions Architect for Cancer Research Providence Health & 
>>>> Services ted.as...@providence.org
>>>> 503-215-6476
>>>>
>>>> Crede, ut intelligas.
>>>> Intellego, ut credam.
>>>>
>>>>
>>>>
>>>>
>>>>   
>>>>
>>>> This message is intended for the sole use of the addressee, and may 
>>>> contain information that is privileged, confidential and exempt from 
>>>> disclosure under applicable law. If you are not the addressee you are 
>>>> hereby notified that you may not use, copy, disclose, or distribute to 
>>>> anyone the message or any information contained in the message. If you 
>>>> have received this message in error, please immediately advise the sender 
>>>> by reply email and delete this message.
>>>>
>


Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-03 Thread Pei Chen
You're right, it should have gotten "CIN I"- that's a strange one,
probably needs to be debugged/looked into further...

On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy
 wrote:
> Ah. So it will get
> CIN 2 (in SNOMED)
> CIN III (in SNOMED)
> CIN 3 (in SNOMED)
>
> but the rest are not in SNOMED?
>
> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
> (though I don't fully understand what all the symbols mean in the umls
> browser).
>
>> CIN I - Cervical intraepithelial neoplasia 1
>> [A3002690/SNOMEDCT/SY/285836003]
>
>
> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> It has the correct parse (POS, chunks, and lookupwindow)- but some of
>> the terms do not exist in SNOMED-
>> CIN 2 - Cervical intraepithelial neoplasia 2
>> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
>> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
>> able to perform the lookup successfully.
>> Note that CIN II synonyms do exist in other umls thersauses such as
>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>>
>> --Pei
>>
>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>>  wrote:
>>> That is a good question, Ted!
>>>
>>> I tried it with a simple context: "The patient has a CIN III." I'm not
>>> sure if that is a correct context but I was able to duplicate your
>>> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>>>
>>> My first thought was that it is the chunker. But the chunker seems to
>>> get it right, as CIN II and CIN III are both called NPs, and similarly
>>> the LookupWindowAnnotator handles them both identically. So that
>>> suggests it is a problem with the actual lookup of the tokens in the
>>> LookupWindow.
>>>
>>> That's all I can do for now but maybe someone else who knows more about
>>> its behavior offhand will have an idea.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
 I'm trying to understand what would prevent the 
 AggregatePlaintextUMLSProcessor AE from correctly parsing specific 
 problems that are defined in the UMLS version used by cTAKES.

 For example,
 CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed 
 out as UMLS CUI C0206708.

 CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
 Numerals, I,II, and III.

 cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
 "Carcinoma in situ of uterine cervix."

 However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
 their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
 "Cervical intraepithelial neoplasia grade 2" respectively.

 Is there a way to tune the detection of UMLS concepts?




 
 Ted Assur
 IT Solutions Architect for Cancer Research
 Providence Health & Services
 ted.as...@providence.org
 503-215-6476

 Crede, ut intelligas.
 Intellego, ut credam.




   

 This message is intended for the sole use of the addressee, and may 
 contain information that is privileged, confidential and exempt from 
 disclosure under applicable law. If you are not the addressee you are 
 hereby notified that you may not use, copy, disclose, or distribute to 
 anyone the message or any information contained in the message. If you 
 have received this message in error, please immediately advise the sender 
 by reply email and delete this message.

>


Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-03 Thread Miller, Timothy
Ah. So it will get
CIN 2 (in SNOMED)
CIN III (in SNOMED)
CIN 3 (in SNOMED)

but the rest are not in SNOMED?

I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
(though I don't fully understand what all the symbols mean in the umls
browser).

> CIN I - Cervical intraepithelial neoplasia 1
> [A3002690/SNOMEDCT/SY/285836003]


On 09/03/2013 09:55 PM, Pei Chen wrote:
> It has the correct parse (POS, chunks, and lookupwindow)- but some of
> the terms do not exist in SNOMED-
> CIN 2 - Cervical intraepithelial neoplasia 2
> [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
> CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
> able to perform the lookup successfully.
> Note that CIN II synonyms do exist in other umls thersauses such as
> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>
> --Pei
>
> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>  wrote:
>> That is a good question, Ted!
>>
>> I tried it with a simple context: "The patient has a CIN III." I'm not
>> sure if that is a correct context but I was able to duplicate your
>> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>>
>> My first thought was that it is the chunker. But the chunker seems to
>> get it right, as CIN II and CIN III are both called NPs, and similarly
>> the LookupWindowAnnotator handles them both identically. So that
>> suggests it is a problem with the actual lookup of the tokens in the
>> LookupWindow.
>>
>> That's all I can do for now but maybe someone else who knows more about
>> its behavior offhand will have an idea.
>>
>> Tim
>>
>>
>>
>>
>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>>> I'm trying to understand what would prevent the 
>>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems 
>>> that are defined in the UMLS version used by cTAKES.
>>>
>>> For example,
>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out 
>>> as UMLS CUI C0206708.
>>>
>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>>> Numerals, I,II, and III.
>>>
>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>>> "Carcinoma in situ of uterine cervix."
>>>
>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>>
>>> Is there a way to tune the detection of UMLS concepts?
>>>
>>>
>>>
>>>
>>> 
>>> Ted Assur
>>> IT Solutions Architect for Cancer Research
>>> Providence Health & Services
>>> ted.as...@providence.org
>>> 503-215-6476
>>>
>>> Crede, ut intelligas.
>>> Intellego, ut credam.
>>>
>>>
>>>
>>>
>>>   
>>>
>>> This message is intended for the sole use of the addressee, and may contain 
>>> information that is privileged, confidential and exempt from disclosure 
>>> under applicable law. If you are not the addressee you are hereby notified 
>>> that you may not use, copy, disclose, or distribute to anyone the message 
>>> or any information contained in the message. If you have received this 
>>> message in error, please immediately advise the sender by reply email and 
>>> delete this message.
>>>



Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-03 Thread Pei Chen
It has the correct parse (POS, chunks, and lookupwindow)- but some of
the terms do not exist in SNOMED-
CIN 2 - Cervical intraepithelial neoplasia 2
[A3002688/SNOMEDCT/SY/285838002] exists but not CIN II.
CIN III [A965/SNOMEDCT/SY/20365006] also exists that's why it was
able to perform the lookup successfully.
Note that CIN II synonyms do exist in other umls thersauses such as
MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries only
contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.

--Pei

On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
 wrote:
> That is a good question, Ted!
>
> I tried it with a simple context: "The patient has a CIN III." I'm not
> sure if that is a correct context but I was able to duplicate your
> findings. (Finds a CUI for CIN III but not if you change it to CIN II)
>
> My first thought was that it is the chunker. But the chunker seems to
> get it right, as CIN II and CIN III are both called NPs, and similarly
> the LookupWindowAnnotator handles them both identically. So that
> suggests it is a problem with the actual lookup of the tokens in the
> LookupWindow.
>
> That's all I can do for now but maybe someone else who knows more about
> its behavior offhand will have an idea.
>
> Tim
>
>
>
>
> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>> I'm trying to understand what would prevent the 
>> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems 
>> that are defined in the UMLS version used by cTAKES.
>>
>> For example,
>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out 
>> as UMLS CUI C0206708.
>>
>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
>> Numerals, I,II, and III.
>>
>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
>> "Carcinoma in situ of uterine cervix."
>>
>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as 
>> their correct concepts, "Cervical intraepithelial neoplasia grade 1" and 
>> "Cervical intraepithelial neoplasia grade 2" respectively.
>>
>> Is there a way to tune the detection of UMLS concepts?
>>
>>
>>
>>
>> 
>> Ted Assur
>> IT Solutions Architect for Cancer Research
>> Providence Health & Services
>> ted.as...@providence.org
>> 503-215-6476
>>
>> Crede, ut intelligas.
>> Intellego, ut credam.
>>
>>
>>
>>
>>   
>>
>> This message is intended for the sole use of the addressee, and may contain 
>> information that is privileged, confidential and exempt from disclosure 
>> under applicable law. If you are not the addressee you are hereby notified 
>> that you may not use, copy, disclose, or distribute to anyone the message or 
>> any information contained in the message. If you have received this message 
>> in error, please immediately advise the sender by reply email and delete 
>> this message.
>>
>


Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-03 Thread Miller, Timothy
That is a good question, Ted!

I tried it with a simple context: "The patient has a CIN III." I'm not
sure if that is a correct context but I was able to duplicate your
findings. (Finds a CUI for CIN III but not if you change it to CIN II)

My first thought was that it is the chunker. But the chunker seems to
get it right, as CIN II and CIN III are both called NPs, and similarly
the LookupWindowAnnotator handles them both identically. So that
suggests it is a problem with the actual lookup of the tokens in the
LookupWindow.

That's all I can do for now but maybe someone else who knows more about
its behavior offhand will have an idea.

Tim




On 09/03/2013 08:24 PM, Assur, Ted wrote:
> I'm trying to understand what would prevent the 
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems 
> that are defined in the UMLS version used by cTAKES.
>
> For example,
> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out 
> as UMLS CUI C0206708.
>
> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
> Numerals, I,II, and III.
>
> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
> "Carcinoma in situ of uterine cervix."
>
> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their 
> correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical 
> intraepithelial neoplasia grade 2" respectively.
>
> Is there a way to tune the detection of UMLS concepts?
>
>
>
>
> 
> Ted Assur
> IT Solutions Architect for Cancer Research
> Providence Health & Services
> ted.as...@providence.org
> 503-215-6476
>
> Crede, ut intelligas.
> Intellego, ut credam.
>
>
>
>
>   
>
> This message is intended for the sole use of the addressee, and may contain 
> information that is privileged, confidential and exempt from disclosure under 
> applicable law. If you are not the addressee you are hereby notified that you 
> may not use, copy, disclose, or distribute to anyone the message or any 
> information contained in the message. If you have received this message in 
> error, please immediately advise the sender by reply email and delete this 
> message.
>



Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor

2013-09-03 Thread Pei Chen
Hi Ted,
Detecting the stage/grade and other attributes and asserting those
relationships to the cancer aside (That's probably a separate
discussion)-  But in your example, since there are distinct SNOMEDCT
concepts and direct matches, it was able to identify "Cervical
intraepithelial neoplasia grade 1"
cui = "C0349458"
code = "285836003"
as well as "Cervical intraepithelial neoplasia"
cui = "C0206708"
code = "285636001"
,etc.
It should also be able to identify "CIN 2" as there should be an exact
match in SNOMEDCT: (CIN 2 - Cervical intraepithelial neoplasia 2
[A3002688/SNOMEDCT/SY/285838002]
Please see attached xml output.

I am using out of the box AggregatePlaintextUMLSProcessor from the 3.1RC3
--Pei

On Tue, Sep 3, 2013 at 8:24 PM, Assur, Ted
 wrote:
> I'm trying to understand what would prevent the 
> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems 
> that are defined in the UMLS version used by cTAKES.
>
> For example,
> CIN (Cervical Intraepithelial Neoplasia) in its general usage is parsed out 
> as UMLS CUI C0206708.
>
> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with Roman 
> Numerals, I,II, and III.
>
> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI C0851140: 
> "Carcinoma in situ of uterine cervix."
>
> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II as their 
> correct concepts, "Cervical intraepithelial neoplasia grade 1" and "Cervical 
> intraepithelial neoplasia grade 2" respectively.
>
> Is there a way to tune the detection of UMLS concepts?
>
>
>
>
> 
> Ted Assur
> IT Solutions Architect for Cancer Research
> Providence Health & Services
> ted.as...@providence.org
> 503-215-6476
>
> Crede, ut intelligas.
> Intellego, ut credam.
>
>
>
>
>   
>
> This message is intended for the sole use of the addressee, and may contain 
> information that is privileged, confidential and exempt from disclosure under 
> applicable law. If you are not the addressee you are hereby notified that you 
> may not use, copy, disclose, or distribute to anyone the message or any 
> information contained in the message. If you have received this message in 
> error, please immediately advise the sender by reply email and delete this 
> message.










































































563
556






492
513
506
499



452















372
393
358
323
337
351
386
365
400
330
316
379
344






























































































272
279
265
286




217
224

















































































257
278
264
271



217