Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-06 Thread Miller, Timothy
No, there is unfortunately no documentation and that code is a total mess. It 
in fact combines evaluations with training, and then we just set aside the best 
evaluated model and made that the default[1]. We really should have separate 
training code in the project and eval code can probably be outside (since it 
usually evolves a lot during development and will be messy). I can walk you 
through it if you're willing to put in some effort but unfortunately it's not a 
trivial thing.

You basically need to run this class: 
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup

it reads the gold standard from a Knowtator scheme developed under the SHARP 
project [2]. But if you point it to already-generated xmi files (in 
train,dev,test sub-directories) it can be used for gold standards in other 
formats. Probably you would need to write a class to generate your own xmi that 
reads whatever format your data is in. You can look here:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup

to see a few methods for generating xmi from different data formats.

Hopefully that is enough information for you to figure out whether it's worth 
pursuing!

Tim


[1] There is some training-specific code but I'm not sure that's kept up with 
the eval code
[2] It can also read a few other formats like i2b2 challenge and mipacq, in 
case your data looks like that.


From: Harish Kulkarni 
Sent: Saturday, January 6, 2018 1:32 AM
To: dev@ctakes.apache.org
Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations [EXTERNAL]

Is there any documentation or tutorial on how to train ctakes for negation
history etc.
I have some data with which to train the system

Thanks
Harish

On Jan 5, 2018 18:19, "Abramowitsch, Peter" 
wrote:

> Sorry for the very wordy contribution here    Following on Tim's
> answer,  I've found the historyOf mechanism to be very weak in its ability
> to detect more than just a few possible permutations, and one frequent
> issue is span where history Of modifies a series of concepts.
>
> 
>
> Here's an example from real notes
>
> "The patient is a 57-year-old woman with a past medical history of OSA ,
> asthma , CAD status post CABG..."
>
> Using my CAS post processor on the output I get this.
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
> OSA
> past medical history of OSA , asthma , CAD status po
>
>   [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
> edical history of OSA , asthma , CAD status post CABG.>
>
>   [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
> story of OSA , asthma , CAD status post CABG.>--
>
> =MEDICATIONS==
> ==SIGNS=
> =PROCEDURES==
>   [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
> Surgery CABG
> sthma , CAD status post CABG.>--
>
>
> I print the history of, confidence, and polarity flags in the [,,, ]
> section before the SNOMED code of each identified annotation.   Notice
> that it found history of OSA but not Asthma or CAD. It did find History of
> again for the procedure CABG because of the word POST.
>
>
> Here's another example
>
> "Cardiac transplant 15 years ago as stated above with chronic
> immunosuppressives , history of gout , hypertension , renal insufficiency."
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
> 15:00:00 +0100~
> ppressives , history of gout , hypertension , renal i
>
>   [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
> renal  2018-01-01 15:00:00 +0100~
> ves , history of gout , hypertension , renal insufficiency.>--
>
>   [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
> insufficiency  2018-01-01 15:00:00 +0100~
> f gout , hypertension , renal insufficiency.>--
>
> =MEDICATIONS==
>   [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
> immunosuppressives  2018-01-01 15:00:00 +0100~
> ated above with chronic immunosuppressives , history of gout , hype
>
> ==SIGNS=
>   [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
> 2018-01-01 15:00:00 +0100~
> ic immunosuppressives , history of gout , hypertension , renal i
>
> =PROCEDURES==
>   [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
> transplant 15 2018-01-01 15:00:00 +0100~
> --
>
>
> Again, it picked up the History Of in the first clause where "history of"
> preceded its 

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Harish Kulkarni
Is there any documentation or tutorial on how to train ctakes for negation
history etc.
I have some data with which to train the system

Thanks
Harish

On Jan 5, 2018 18:19, "Abramowitsch, Peter" 
wrote:

> Sorry for the very wordy contribution here    Following on Tim's
> answer,  I've found the historyOf mechanism to be very weak in its ability
> to detect more than just a few possible permutations, and one frequent
> issue is span where history Of modifies a series of concepts.
>
> 
>
> Here's an example from real notes
>
> "The patient is a 57-year-old woman with a past medical history of OSA ,
> asthma , CAD status post CABG..."
>
> Using my CAS post processor on the output I get this.
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
> OSA
> past medical history of OSA , asthma , CAD status po
>
>   [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
> edical history of OSA , asthma , CAD status post CABG.>
>
>   [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
> story of OSA , asthma , CAD status post CABG.>--
>
> =MEDICATIONS==
> ==SIGNS=
> =PROCEDURES==
>   [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
> Surgery CABG
> sthma , CAD status post CABG.>--
>
>
> I print the history of, confidence, and polarity flags in the [,,, ]
> section before the SNOMED code of each identified annotation.   Notice
> that it found history of OSA but not Asthma or CAD. It did find History of
> again for the procedure CABG because of the word POST.
>
>
> Here's another example
>
> "Cardiac transplant 15 years ago as stated above with chronic
> immunosuppressives , history of gout , hypertension , renal insufficiency."
>
>
> PROBLEMS
>   [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
> 15:00:00 +0100~
> ppressives , history of gout , hypertension , renal i
>
>   [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
> renal  2018-01-01 15:00:00 +0100~
> ves , history of gout , hypertension , renal insufficiency.>--
>
>   [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
> insufficiency  2018-01-01 15:00:00 +0100~
> f gout , hypertension , renal insufficiency.>--
>
> =MEDICATIONS==
>   [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
> immunosuppressives  2018-01-01 15:00:00 +0100~
> ated above with chronic immunosuppressives , history of gout , hype
>
> ==SIGNS=
>   [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
> 2018-01-01 15:00:00 +0100~
> ic immunosuppressives , history of gout , hypertension , renal i
>
> =PROCEDURES==
>   [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
> transplant 15 2018-01-01 15:00:00 +0100~
> --
>
>
> Again, it picked up the History Of in the first clause where "history of"
> preceded its predicate, but not subsequent ones, or after a time
> expression indicating the past.
>
> I have a mind to work on this one day, but I think I'll be doing it in my
> CAS post processor rather than the annotator itself as the problem really
> involves a whole new solution that looks at the semantics of the whole
> sentence and not just "history of (x)"  For that we'd start looking at the
> conldep nodes, time annotations, and more.
>
> Peter
>
>
>
>
>
> On 1/5/18, 12:58 PM, "Miller, Timothy"
>  wrote:
>
> >Uncertainty is when the text indicates some hedging about the concept:
> >"possible asthma" should have asthma as an IdentifiedAnnotation with the
> >uncertainty flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >HistoryOf is for concepts that are explicitly in patient history, often
> >in a history section.
> >"history of lymphoma as a child"
> >lymphoma should have its history flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >Confidence is a field that I don't believe gets set by any current
> >annotators, but in theory it is for methods that might use statistical
> >methods that output a score to set the score there.
> >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
> >that score.
> >
> >DiscoveryTechnique is a way to flag which entities were annotated by
> >which annotator, since it's possible to have, e.g., multiple clinical
> >concept taggers. We use it occasionally internally
> >to separate gold standard entities from system-discovered entities (in a
> >machine learning evaluation) but I don't know if any standard pipeline
> >components set it.
> >
> >Tim
> >
> >
> >From: Kumari,Puja 
> 

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-05 Thread Abramowitsch, Peter
Sorry for the very wordy contribution here    Following on Tim's
answer,  I've found the historyOf mechanism to be very weak in its ability
to detect more than just a few possible permutations, and one frequent
issue is span where history Of modifies a series of concepts.



Here's an example from real notes

"The patient is a 57-year-old woman with a past medical history of OSA ,
asthma , CAD status post CABG..."

Using my CAS post processor on the output I get this.


PROBLEMS
  [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
OSA  
past medical history of OSA , asthma , CAD status po

  [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
edical history of OSA , asthma , CAD status post CABG.>

  [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
story of OSA , asthma , CAD status post CABG.>--

=MEDICATIONS==
==SIGNS=
=PROCEDURES==
  [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
Surgery CABG  
sthma , CAD status post CABG.>--


I print the history of, confidence, and polarity flags in the [,,, ]
section before the SNOMED code of each identified annotation.   Notice
that it found history of OSA but not Asthma or CAD. It did find History of
again for the procedure CABG because of the word POST.


Here's another example

"Cardiac transplant 15 years ago as stated above with chronic
immunosuppressives , history of gout , hypertension , renal insufficiency."


PROBLEMS
  [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
15:00:00 +0100~
ppressives , history of gout , hypertension , renal i

  [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
renal  2018-01-01 15:00:00 +0100~
ves , history of gout , hypertension , renal insufficiency.>--

  [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
insufficiency  2018-01-01 15:00:00 +0100~
f gout , hypertension , renal insufficiency.>--

=MEDICATIONS==
  [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
immunosuppressives  2018-01-01 15:00:00 +0100~
ated above with chronic immunosuppressives , history of gout , hype

==SIGNS=
  [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
2018-01-01 15:00:00 +0100~
ic immunosuppressives , history of gout , hypertension , renal i

=PROCEDURES==
  [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
transplant 15 2018-01-01 15:00:00 +0100~
-- wrote:

>Uncertainty is when the text indicates some hedging about the concept:
>"possible asthma" should have asthma as an IdentifiedAnnotation with the
>uncertainty flag set to 1.
>This is done by machine learning and it is not easy so it is not perfect.
>
>HistoryOf is for concepts that are explicitly in patient history, often
>in a history section.
>"history of lymphoma as a child"
>lymphoma should have its history flag set to 1.
>This is done by machine learning and it is not easy so it is not perfect.
>
>Confidence is a field that I don't believe gets set by any current
>annotators, but in theory it is for methods that might use statistical
>methods that output a score to set the score there.
>The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
>that score.
>
>DiscoveryTechnique is a way to flag which entities were annotated by
>which annotator, since it's possible to have, e.g., multiple clinical
>concept taggers. We use it occasionally internally
>to separate gold standard entities from system-discovered entities (in a
>machine learning evaluation) but I don't know if any standard pipeline
>components set it.
>
>Tim
>
>
>From: Kumari,Puja 
>Sent: Friday, January 5, 2018 12:03 AM
>To: dev@ctakes.apache.org
>Subject: Re: Unable to understand the importance of attributes in
>IdentifiedAnnotations [EXTERNAL]
>
>Hi,
>
>
>
>Thanks for the replies but I am still not able to understand the
>significance of the attributes such as Uncertainty, HistoryOf,
>Confidence, DiscoveryTechniques.
>
>Can anyone give some examples or any information which will help me to
>understand these concepts in more depth?
>
>
>
>Thanks.
>
>Puja Kumari
>
>
>
>On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> wrote:
>
>
>
>Try out this link -
>"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.prote
>ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconflu
>ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-26dat
>a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C989437995db145fcbaa808d5
>536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506640417