Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
No, there is unfortunately no documentation and that code is a total mess. It in fact combines evaluations with training, and then we just set aside the best evaluated model and made that the default[1]. We really should have separate training code in the project and eval code can probably be outside (since it usually evolves a lot during development and will be messy). I can walk you through it if you're willing to put in some effort but unfortunately it's not a trivial thing. You basically need to run this class: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup it reads the gold standard from a Knowtator scheme developed under the SHARP project [2]. But if you point it to already-generated xmi files (in train,dev,test sub-directories) it can be used for gold standards in other formats. Probably you would need to write a class to generate your own xmi that reads whatever format your data is in. You can look here: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup to see a few methods for generating xmi from different data formats. Hopefully that is enough information for you to figure out whether it's worth pursuing! Tim [1] There is some training-specific code but I'm not sure that's kept up with the eval code [2] It can also read a few other formats like i2b2 challenge and mipacq, in case your data looks like that. From: Harish KulkarniSent: Saturday, January 6, 2018 1:32 AM To: dev@ctakes.apache.org Subject: Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL] Is there any documentation or tutorial on how to train ctakes for negation history etc. I have some data with which to train the system Thanks Harish On Jan 5, 2018 18:19, "Abramowitsch, Peter" wrote: > Sorry for the very wordy contribution here Following on Tim's > answer, I've found the historyOf mechanism to be very weak in its ability > to detect more than just a few possible permutations, and one frequent > issue is span where history Of modifies a series of concepts. > > > > Here's an example from real notes > > "The patient is a 57-year-old woman with a past medical history of OSA , > asthma , CAD status post CABG..." > > Using my CAS post processor on the output I get this. > > > PROBLEMS > [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive > OSA > past medical history of OSA , asthma , CAD status po > > [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma > edical history of OSA , asthma , CAD status post CABG.> > > [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD > story of OSA , asthma , CAD status post CABG.>-- > > =MEDICATIONS== > ==SIGNS= > =PROCEDURES== > [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass > Surgery CABG > sthma , CAD status post CABG.>-- > > > I print the history of, confidence, and polarity flags in the [,,, ] > section before the SNOMED code of each identified annotation. Notice > that it found history of OSA but not Asthma or CAD. It did find History of > again for the procedure CABG because of the word POST. > > > Here's another example > > "Cardiac transplant 15 years ago as stated above with chronic > immunosuppressives , history of gout , hypertension , renal insufficiency." > > > PROBLEMS > [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 > 15:00:00 +0100~ > ppressives , history of gout , hypertension , renal i > > [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , > renal 2018-01-01 15:00:00 +0100~ > ves , history of gout , hypertension , renal insufficiency.>-- > > [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal > insufficiency 2018-01-01 15:00:00 +0100~ > f gout , hypertension , renal insufficiency.>-- > > =MEDICATIONS== > [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents > immunosuppressives 2018-01-01 15:00:00 +0100~ > ated above with chronic immunosuppressives , history of gout , hype > > ==SIGNS= > [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout > 2018-01-01 15:00:00 +0100~ > ic immunosuppressives , history of gout , hypertension , renal i > > =PROCEDURES== > [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac > transplant 15 2018-01-01 15:00:00 +0100~ > -- > > > Again, it picked up the History Of in the first clause where "history of" > preceded its
Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
Is there any documentation or tutorial on how to train ctakes for negation history etc. I have some data with which to train the system Thanks Harish On Jan 5, 2018 18:19, "Abramowitsch, Peter"wrote: > Sorry for the very wordy contribution here Following on Tim's > answer, I've found the historyOf mechanism to be very weak in its ability > to detect more than just a few possible permutations, and one frequent > issue is span where history Of modifies a series of concepts. > > > > Here's an example from real notes > > "The patient is a 57-year-old woman with a past medical history of OSA , > asthma , CAD status post CABG..." > > Using my CAS post processor on the output I get this. > > > PROBLEMS > [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive > OSA > past medical history of OSA , asthma , CAD status po > > [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma > edical history of OSA , asthma , CAD status post CABG.> > > [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD > story of OSA , asthma , CAD status post CABG.>-- > > =MEDICATIONS== > ==SIGNS= > =PROCEDURES== > [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass > Surgery CABG > sthma , CAD status post CABG.>-- > > > I print the history of, confidence, and polarity flags in the [,,, ] > section before the SNOMED code of each identified annotation. Notice > that it found history of OSA but not Asthma or CAD. It did find History of > again for the procedure CABG because of the word POST. > > > Here's another example > > "Cardiac transplant 15 years ago as stated above with chronic > immunosuppressives , history of gout , hypertension , renal insufficiency." > > > PROBLEMS > [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 > 15:00:00 +0100~ > ppressives , history of gout , hypertension , renal i > > [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , > renal 2018-01-01 15:00:00 +0100~ > ves , history of gout , hypertension , renal insufficiency.>-- > > [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal > insufficiency 2018-01-01 15:00:00 +0100~ > f gout , hypertension , renal insufficiency.>-- > > =MEDICATIONS== > [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents > immunosuppressives 2018-01-01 15:00:00 +0100~ > ated above with chronic immunosuppressives , history of gout , hype > > ==SIGNS= > [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout > 2018-01-01 15:00:00 +0100~ > ic immunosuppressives , history of gout , hypertension , renal i > > =PROCEDURES== > [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac > transplant 15 2018-01-01 15:00:00 +0100~ > -- > > > Again, it picked up the History Of in the first clause where "history of" > preceded its predicate, but not subsequent ones, or after a time > expression indicating the past. > > I have a mind to work on this one day, but I think I'll be doing it in my > CAS post processor rather than the annotator itself as the problem really > involves a whole new solution that looks at the semantics of the whole > sentence and not just "history of (x)" For that we'd start looking at the > conldep nodes, time annotations, and more. > > Peter > > > > > > On 1/5/18, 12:58 PM, "Miller, Timothy" > wrote: > > >Uncertainty is when the text indicates some hedging about the concept: > >"possible asthma" should have asthma as an IdentifiedAnnotation with the > >uncertainty flag set to 1. > >This is done by machine learning and it is not easy so it is not perfect. > > > >HistoryOf is for concepts that are explicitly in patient history, often > >in a history section. > >"history of lymphoma as a child" > >lymphoma should have its history flag set to 1. > >This is done by machine learning and it is not easy so it is not perfect. > > > >Confidence is a field that I don't believe gets set by any current > >annotators, but in theory it is for methods that might use statistical > >methods that output a score to set the score there. > >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set > >that score. > > > >DiscoveryTechnique is a way to flag which entities were annotated by > >which annotator, since it's possible to have, e.g., multiple clinical > >concept taggers. We use it occasionally internally > >to separate gold standard entities from system-discovered entities (in a > >machine learning evaluation) but I don't know if any standard pipeline > >components set it. > > > >Tim > > > > > >From: Kumari,Puja >
Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
Sorry for the very wordy contribution here Following on Tim's answer, I've found the historyOf mechanism to be very weak in its ability to detect more than just a few possible permutations, and one frequent issue is span where history Of modifies a series of concepts. Here's an example from real notes "The patient is a 57-year-old woman with a past medical history of OSA , asthma , CAD status post CABG..." Using my CAS post processor on the output I get this. PROBLEMS [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive OSA past medical history of OSA , asthma , CAD status po [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma edical history of OSA , asthma , CAD status post CABG.> [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD story of OSA , asthma , CAD status post CABG.>-- =MEDICATIONS== ==SIGNS= =PROCEDURES== [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass Surgery CABG sthma , CAD status post CABG.>-- I print the history of, confidence, and polarity flags in the [,,, ] section before the SNOMED code of each identified annotation. Notice that it found history of OSA but not Asthma or CAD. It did find History of again for the procedure CABG because of the word POST. Here's another example "Cardiac transplant 15 years ago as stated above with chronic immunosuppressives , history of gout , hypertension , renal insufficiency." PROBLEMS [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 15:00:00 +0100~ ppressives , history of gout , hypertension , renal i [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , renal 2018-01-01 15:00:00 +0100~ ves , history of gout , hypertension , renal insufficiency.>-- [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal insufficiency 2018-01-01 15:00:00 +0100~ f gout , hypertension , renal insufficiency.>-- =MEDICATIONS== [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents immunosuppressives 2018-01-01 15:00:00 +0100~ ated above with chronic immunosuppressives , history of gout , hype ==SIGNS= [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout 2018-01-01 15:00:00 +0100~ ic immunosuppressives , history of gout , hypertension , renal i =PROCEDURES== [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac transplant 15 2018-01-01 15:00:00 +0100~ -- wrote: >Uncertainty is when the text indicates some hedging about the concept: >"possible asthma" should have asthma as an IdentifiedAnnotation with the >uncertainty flag set to 1. >This is done by machine learning and it is not easy so it is not perfect. > >HistoryOf is for concepts that are explicitly in patient history, often >in a history section. >"history of lymphoma as a child" >lymphoma should have its history flag set to 1. >This is done by machine learning and it is not easy so it is not perfect. > >Confidence is a field that I don't believe gets set by any current >annotators, but in theory it is for methods that might use statistical >methods that output a score to set the score there. >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set >that score. > >DiscoveryTechnique is a way to flag which entities were annotated by >which annotator, since it's possible to have, e.g., multiple clinical >concept taggers. We use it occasionally internally >to separate gold standard entities from system-discovered entities (in a >machine learning evaluation) but I don't know if any standard pipeline >components set it. > >Tim > > >From: Kumari,Puja>Sent: Friday, January 5, 2018 12:03 AM >To: dev@ctakes.apache.org >Subject: Re: Unable to understand the importance of attributes in >IdentifiedAnnotations [EXTERNAL] > >Hi, > > > >Thanks for the replies but I am still not able to understand the >significance of the attributes such as Uncertainty, HistoryOf, >Confidence, DiscoveryTechniques. > >Can anyone give some examples or any information which will help me to >understand these concepts in more depth? > > > >Thanks. > >Puja Kumari > > > >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan" > wrote: > > > >Try out this link - >"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.prote >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconflu >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-26dat >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C989437995db145fcbaa808d5 >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506640417