Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-06 Thread Miller, Timothy
No, there is unfortunately no documentation and that code is a total mess. It 
in fact combines evaluations with training, and then we just set aside the best 
evaluated model and made that the default[1]. We really should have separate 
training code in the project and eval code can probably be outside (since it 
usually evolves a lot during development and will be messy). I can walk you 
through it if you're willing to put in some effort but unfortunately it's not a 
trivial thing.

You basically need to run this class: 
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup

it reads the gold standard from a Knowtator scheme developed under the SHARP 
project [2]. But if you point it to already-generated xmi files (in 
train,dev,test sub-directories) it can be used for gold standards in other 
formats. Probably you would need to write a class to generate your own xmi that 
reads whatever format your data is in. You can look here:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup

to see a few methods for generating xmi from different data formats.

Hopefully that is enough information for you to figure out whether it's worth 
pursuing!

Tim


[1] There is some training-specific code but I'm not sure that's kept up with 
the eval code
[2] It can also read a few other formats like i2b2 challenge and mipacq, in 
case your data looks like that.


From: Harish Kulkarni 
Sent: Saturday, January 6, 2018 1:32 AM
To: dev@ctakes.apache.org
Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations [EXTERNAL]

Is there any documentation or tutorial on how to train ctakes for negation
history etc.
I have some data with which to train the system

Thanks
Harish

On Jan 5, 2018 18:19, "Abramowitsch, Peter" 
wrote:

> Sorry for the very wordy contribution here    Following on Tim's
> answer,  I've found the historyOf mechanism to be very weak in its ability
> to detect more than just a few possible permutations, and one frequent
> issue is span where history Of modifies a series of concepts.
>
> 
>
> Here's an example from real notes
>
> "The patient is a 57-year-old woman with a past medical history of OSA ,
> asthma , CAD status post CABG..."
>
> Using my CAS post processor on the output I get this.
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
> OSA
> past medical history of OSA , asthma , CAD status po
>
>   [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
> edical history of OSA , asthma , CAD status post CABG.>
>
>   [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
> story of OSA , asthma , CAD status post CABG.>--
>
> =MEDICATIONS==
> ==SIGNS=
> =PROCEDURES==
>   [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
> Surgery CABG
> sthma , CAD status post CABG.>--
>
>
> I print the history of, confidence, and polarity flags in the [,,, ]
> section before the SNOMED code of each identified annotation.   Notice
> that it found history of OSA but not Asthma or CAD. It did find History of
> again for the procedure CABG because of the word POST.
>
>
> Here's another example
>
> "Cardiac transplant 15 years ago as stated above with chronic
> immunosuppressives , history of gout , hypertension , renal insufficiency."
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
> 15:00:00 +0100~
> ppressives , history of gout , hypertension , renal i
>
>   [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
> renal  2018-01-01 15:00:00 +0100~
> ves , history of gout , hypertension , renal insufficiency.>--
>
>   [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
> insufficiency  2018-01-01 15:00:00 +0100~
> f gout , hypertension , renal insufficiency.>--
>
> =MEDICATIONS==
>   [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
> immunosuppressives  2018-01-01 15:00:00 +0100~
> ated above with chronic immunosuppressives , history of gout , hype
>
> ==SIGNS=
>   [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
> 2018-01-01 15:00:00 +0100~
> ic immunosuppressives , history of gout , hypertension , renal i
>
> =

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Harish Kulkarni
overyTechnique is a way to flag which entities were annotated by
> >which annotator, since it's possible to have, e.g., multiple clinical
> >concept taggers. We use it occasionally internally
> >to separate gold standard entities from system-discovered entities (in a
> >machine learning evaluation) but I don't know if any standard pipeline
> >components set it.
> >
> >Tim
> >
> >
> >From: Kumari,Puja 
> >Sent: Friday, January 5, 2018 12:03 AM
> >To: dev@ctakes.apache.org
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations [EXTERNAL]
> >
> >Hi,
> >
> >
> >
> >Thanks for the replies but I am still not able to understand the
> >significance of the attributes such as Uncertainty, HistoryOf,
> >Confidence, DiscoveryTechniques.
> >
> >Can anyone give some examples or any information which will help me to
> >understand these concepts in more depth?
> >
> >
> >
> >Thanks.
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> > wrote:
> >
> >
> >
> >Try out this link -
> >"https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prote
> >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconflu
> >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-26dat
> >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-
> 257C989437995db145fcbaa808d5
> >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-
> 257C636506640417
> >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM
> gNGoUbJRW1Hevp4-253D-26reserv
> >ed-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> Heup-IbsIg
> >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h&m=muQ5_Uh4Q-5Uui87e
> >9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK
> JOD5orbrpKGro&e
> >="
> >
> >
> >
> >Regards,
> >
> >Gandhi
> >
> >
> >
> >
> >
> >-Original Message-
> >
> >From: Kumari,Puja [mailto:puja.kuma...@cerner.com]
> >
> >Sent: Thursday, January 04, 2018 3:11 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations
> >
> >
> >
> >Hi,
> >
> >
> >
> >Thanks for your reply Krishnareddy but the link given says ³page not
> >found². Any other suggestions/links that you can share would be
> >appreciable.
> >
> >
> >
> >Thanks
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 2:51 PM, "Krishnareddy"  wrote:
> >
> >
> >
> >Hi,
> >
> >
> >
> >  You can find related information about these attributes in
> >following link
> >
> >
> >
> >
> >_*https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prot
> >ection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconfl
> >uence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-2A-5
> >F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.
> com-257C738752ad0ee24b8ba
> >e6208d553547f25-257Cfbc493a80d244454a815f4ca58
> e8c09d-257C0-257C0-257C63650
> >6544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX
> 0SJTNfWMOG8-252B5fo-25
> >3D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r
> >=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h&m=muQ5_U
> >h4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw-
> esE6uy
> >r0W8KKtpA&e=
> >
> >
> >
> >
> >
> >Thank You
> >
> >
> >
> >Krishna Reddy
> >
> >
> >
> >
> >
> >On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote:
> >
> >> Hi,
> >
> >> I am working on IdentifiedAnnotations in apache cTAKES and I am
> >not able to  interpret the meaning of the following attributes under
> >IdentifiedAnnotations:
> >
> >> 1.Uncertainty
> >
> >> 2.History
> >
> >> 3.Confidence
> >
> >> 4.Discovery Techniques
> >
> >>
> >
> >> What is the importance of these attributes?
> >

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Abramowitsch, Peter
Sorry for the very wordy contribution here    Following on Tim's
answer,  I've found the historyOf mechanism to be very weak in its ability
to detect more than just a few possible permutations, and one frequent
issue is span where history Of modifies a series of concepts.



Here's an example from real notes

"The patient is a 57-year-old woman with a past medical history of OSA ,
asthma , CAD status post CABG..."

Using my CAS post processor on the output I get this.


PROBLEMS
  [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
OSA  
past medical history of OSA , asthma , CAD status po

  [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
edical history of OSA , asthma , CAD status post CABG.>

  [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
story of OSA , asthma , CAD status post CABG.>--

=MEDICATIONS==
==SIGNS=
=PROCEDURES==
  [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
Surgery CABG  
sthma , CAD status post CABG.>--


I print the history of, confidence, and polarity flags in the [,,, ]
section before the SNOMED code of each identified annotation.   Notice
that it found history of OSA but not Asthma or CAD. It did find History of
again for the procedure CABG because of the word POST.


Here's another example

"Cardiac transplant 15 years ago as stated above with chronic
immunosuppressives , history of gout , hypertension , renal insufficiency."


PROBLEMS
  [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
15:00:00 +0100~
ppressives , history of gout , hypertension , renal i

  [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
renal  2018-01-01 15:00:00 +0100~
ves , history of gout , hypertension , renal insufficiency.>--

  [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
insufficiency  2018-01-01 15:00:00 +0100~
f gout , hypertension , renal insufficiency.>--

=MEDICATIONS==
  [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
immunosuppressives  2018-01-01 15:00:00 +0100~
ated above with chronic immunosuppressives , history of gout , hype

==SIGNS=
  [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
2018-01-01 15:00:00 +0100~
ic immunosuppressives , history of gout , hypertension , renal i

=PROCEDURES==
  [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
transplant 15 2018-01-01 15:00:00 +0100~
-- wrote:

>Uncertainty is when the text indicates some hedging about the concept:
>"possible asthma" should have asthma as an IdentifiedAnnotation with the
>uncertainty flag set to 1.
>This is done by machine learning and it is not easy so it is not perfect.
>
>HistoryOf is for concepts that are explicitly in patient history, often
>in a history section.
>"history of lymphoma as a child"
>lymphoma should have its history flag set to 1.
>This is done by machine learning and it is not easy so it is not perfect.
>
>Confidence is a field that I don't believe gets set by any current
>annotators, but in theory it is for methods that might use statistical
>methods that output a score to set the score there.
>The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
>that score.
>
>DiscoveryTechnique is a way to flag which entities were annotated by
>which annotator, since it's possible to have, e.g., multiple clinical
>concept taggers. We use it occasionally internally
>to separate gold standard entities from system-discovered entities (in a
>machine learning evaluation) but I don't know if any standard pipeline
>components set it.
>
>Tim
>
>____________
>From: Kumari,Puja 
>Sent: Friday, January 5, 2018 12:03 AM
>To: dev@ctakes.apache.org
>Subject: Re: Unable to understand the importance of attributes in
>IdentifiedAnnotations [EXTERNAL]
>
>Hi,
>
>
>
>Thanks for the replies but I am still not able to understand the
>significance of the attributes such as Uncertainty, HistoryOf,
>Confidence, DiscoveryTechniques.
>
>Can anyone give some examples or any information which will help me to
>understand these concepts in more depth?
>
>
>
>Thanks.
>
>Puja Kumari
>
>
>
>On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> wrote:
>
>
>
>Try out this link -
>"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.prote
>ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconflu
>ence-252Fdisplay-252FCT

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Miller, Timothy
Uncertainty is when the text indicates some hedging about the concept:
"possible asthma" should have asthma as an IdentifiedAnnotation with the 
uncertainty flag set to 1.
This is done by machine learning and it is not easy so it is not perfect.

HistoryOf is for concepts that are explicitly in patient history, often in a 
history section.
"history of lymphoma as a child"
lymphoma should have its history flag set to 1.
This is done by machine learning and it is not easy so it is not perfect.

Confidence is a field that I don't believe gets set by any current annotators, 
but in theory it is for methods that might use statistical methods that output 
a score to set the score there.
The cTAKES dictionary lookup either hits or doesn't, so it doesn't set that 
score.

DiscoveryTechnique is a way to flag which entities were annotated by which 
annotator, since it's possible to have, e.g., multiple clinical concept 
taggers. We use it occasionally internally
to separate gold standard entities from system-discovered entities (in a 
machine learning evaluation) but I don't know if any standard pipeline 
components set it.

Tim


From: Kumari,Puja 
Sent: Friday, January 5, 2018 12:03 AM
To: dev@ctakes.apache.org
Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations [EXTERNAL]

Hi,



Thanks for the replies but I am still not able to understand the significance 
of the attributes such as Uncertainty, HistoryOf, Confidence, 
DiscoveryTechniques.

Can anyone give some examples or any information which will help me to 
understand these concepts in more depth?



Thanks.

Puja Kumari



On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"  
wrote:



Try out this link - 
"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C989437995db145fcbaa808d5536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506640417310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDMgNGoUbJRW1Hevp4-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUKJOD5orbrpKGro&e=";



Regards,

Gandhi





-Original Message-

From: Kumari,Puja [mailto:puja.kuma...@cerner.com]

Sent: Thursday, January 04, 2018 3:11 PM

To: dev@ctakes.apache.org

Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations



Hi,



Thanks for your reply Krishnareddy but the link given says “page not 
found”. Any other suggestions/links that you can share would be appreciable.



Thanks

Puja Kumari



On 1/4/18, 2:51 PM, "Krishnareddy"  wrote:



Hi,



  You can find related information about these attributes in following 
link




_*https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-2A-5F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C738752ad0ee24b8bae6208d553547f25-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX0SJTNfWMOG8-252B5fo-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw-esE6uyr0W8KKtpA&e=





Thank You



Krishna Reddy





On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote:

> Hi,

> I am working on IdentifiedAnnotations in apache cTAKES and I am not 
able to  interpret the meaning of the following attributes under 
IdentifiedAnnotations:

> 1.Uncertainty

> 2.History

> 3.Confidence

> 4.Discovery Techniques

>

> What is the importance of these attributes?

> How can we make use of these to make our work efficient?

> Any suggestion / link to understand more would be helpful.

>

>

> Thanks.

> Puja Kumari

> puja.kuma...@cerner.com<mailto:puja.kuma...@cerner.com>

>

>

>

>

>

> CONFIDENTIALITY NOTICE This message and any included attachments are 
from Cerner Corporation and are intended only for the addressee. The 
information contained in this message is confidential and may constitute inside 
or non-public infor