Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
No, there is unfortunately no documentation and that code is a total mess. It in fact combines evaluations with training, and then we just set aside the best evaluated model and made that the default[1]. We really should have separate training code in the project and eval code can probably be outside (since it usually evolves a lot during development and will be messy). I can walk you through it if you're willing to put in some effort but unfortunately it's not a trivial thing. You basically need to run this class: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup it reads the gold standard from a Knowtator scheme developed under the SHARP project [2]. But if you point it to already-generated xmi files (in train,dev,test sub-directories) it can be used for gold standards in other formats. Probably you would need to write a class to generate your own xmi that reads whatever format your data is in. You can look here: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup to see a few methods for generating xmi from different data formats. Hopefully that is enough information for you to figure out whether it's worth pursuing! Tim [1] There is some training-specific code but I'm not sure that's kept up with the eval code [2] It can also read a few other formats like i2b2 challenge and mipacq, in case your data looks like that. From: Harish Kulkarni Sent: Saturday, January 6, 2018 1:32 AM To: dev@ctakes.apache.org Subject: Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL] Is there any documentation or tutorial on how to train ctakes for negation history etc. I have some data with which to train the system Thanks Harish On Jan 5, 2018 18:19, "Abramowitsch, Peter" wrote: > Sorry for the very wordy contribution here Following on Tim's > answer, I've found the historyOf mechanism to be very weak in its ability > to detect more than just a few possible permutations, and one frequent > issue is span where history Of modifies a series of concepts. > > > > Here's an example from real notes > > "The patient is a 57-year-old woman with a past medical history of OSA , > asthma , CAD status post CABG..." > > Using my CAS post processor on the output I get this. > > > PROBLEMS > [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive > OSA > past medical history of OSA , asthma , CAD status po > > [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma > edical history of OSA , asthma , CAD status post CABG.> > > [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD > story of OSA , asthma , CAD status post CABG.>-- > > =MEDICATIONS== > ==SIGNS= > =PROCEDURES== > [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass > Surgery CABG > sthma , CAD status post CABG.>-- > > > I print the history of, confidence, and polarity flags in the [,,, ] > section before the SNOMED code of each identified annotation. Notice > that it found history of OSA but not Asthma or CAD. It did find History of > again for the procedure CABG because of the word POST. > > > Here's another example > > "Cardiac transplant 15 years ago as stated above with chronic > immunosuppressives , history of gout , hypertension , renal insufficiency." > > > PROBLEMS > [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 > 15:00:00 +0100~ > ppressives , history of gout , hypertension , renal i > > [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , > renal 2018-01-01 15:00:00 +0100~ > ves , history of gout , hypertension , renal insufficiency.>-- > > [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal > insufficiency 2018-01-01 15:00:00 +0100~ > f gout , hypertension , renal insufficiency.>-- > > =MEDICATIONS== > [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents > immunosuppressives 2018-01-01 15:00:00 +0100~ > ated above with chronic immunosuppressives , history of gout , hype > > ==SIGNS= > [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout > 2018-01-01 15:00:00 +0100~ > ic immunosuppressives , history of gout , hypertension , renal i > > =
Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
overyTechnique is a way to flag which entities were annotated by > >which annotator, since it's possible to have, e.g., multiple clinical > >concept taggers. We use it occasionally internally > >to separate gold standard entities from system-discovered entities (in a > >machine learning evaluation) but I don't know if any standard pipeline > >components set it. > > > >Tim > > > > > >From: Kumari,Puja > >Sent: Friday, January 5, 2018 12:03 AM > >To: dev@ctakes.apache.org > >Subject: Re: Unable to understand the importance of attributes in > >IdentifiedAnnotations [EXTERNAL] > > > >Hi, > > > > > > > >Thanks for the replies but I am still not able to understand the > >significance of the attributes such as Uncertainty, HistoryOf, > >Confidence, DiscoveryTechniques. > > > >Can anyone give some examples or any information which will help me to > >understand these concepts in more depth? > > > > > > > >Thanks. > > > >Puja Kumari > > > > > > > >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan" > > wrote: > > > > > > > >Try out this link - > >"https://urldefense.proofpoint.com/v2/url?u=https- > 3A__na01.safelinks.prote > >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki. > apache.org-252Fconflu > >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D- > 252BAssertion-26dat > >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com- > 257C989437995db145fcbaa808d5 > >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0- > 257C636506640417 > >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM > gNGoUbJRW1Hevp4-253D-26reserv > >ed-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > Heup-IbsIg > >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY > x6674h&m=muQ5_Uh4Q-5Uui87e > >9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK > JOD5orbrpKGro&e > >=" > > > > > > > >Regards, > > > >Gandhi > > > > > > > > > > > >-Original Message- > > > >From: Kumari,Puja [mailto:puja.kuma...@cerner.com] > > > >Sent: Thursday, January 04, 2018 3:11 PM > > > >To: dev@ctakes.apache.org > > > >Subject: Re: Unable to understand the importance of attributes in > >IdentifiedAnnotations > > > > > > > >Hi, > > > > > > > >Thanks for your reply Krishnareddy but the link given says ³page not > >found². Any other suggestions/links that you can share would be > >appreciable. > > > > > > > >Thanks > > > >Puja Kumari > > > > > > > >On 1/4/18, 2:51 PM, "Krishnareddy" wrote: > > > > > > > >Hi, > > > > > > > > You can find related information about these attributes in > >following link > > > > > > > > > >_*https://urldefense.proofpoint.com/v2/url?u=https- > 3A__na01.safelinks.prot > >ection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki. > apache.org-252Fconfl > >uence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D- > 252BAssertion-2A-5 > >F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner. > com-257C738752ad0ee24b8ba > >e6208d553547f25-257Cfbc493a80d244454a815f4ca58 > e8c09d-257C0-257C0-257C63650 > >6544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX > 0SJTNfWMOG8-252B5fo-25 > >3D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_ > 3xhKwEW14JZMSdioCoppxeFU&r > >=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY > x6674h&m=muQ5_U > >h4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw- > esE6uy > >r0W8KKtpA&e= > > > > > > > > > > > >Thank You > > > > > > > >Krishna Reddy > > > > > > > > > > > >On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote: > > > >> Hi, > > > >> I am working on IdentifiedAnnotations in apache cTAKES and I am > >not able to interpret the meaning of the following attributes under > >IdentifiedAnnotations: > > > >> 1.Uncertainty > > > >> 2.History > > > >> 3.Confidence > > > >> 4.Discovery Techniques > > > >> > > > >> What is the importance of these attributes? > >
Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
Sorry for the very wordy contribution here Following on Tim's answer, I've found the historyOf mechanism to be very weak in its ability to detect more than just a few possible permutations, and one frequent issue is span where history Of modifies a series of concepts. Here's an example from real notes "The patient is a 57-year-old woman with a past medical history of OSA , asthma , CAD status post CABG..." Using my CAS post processor on the output I get this. PROBLEMS [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive OSA past medical history of OSA , asthma , CAD status po [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma edical history of OSA , asthma , CAD status post CABG.> [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD story of OSA , asthma , CAD status post CABG.>-- =MEDICATIONS== ==SIGNS= =PROCEDURES== [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass Surgery CABG sthma , CAD status post CABG.>-- I print the history of, confidence, and polarity flags in the [,,, ] section before the SNOMED code of each identified annotation. Notice that it found history of OSA but not Asthma or CAD. It did find History of again for the procedure CABG because of the word POST. Here's another example "Cardiac transplant 15 years ago as stated above with chronic immunosuppressives , history of gout , hypertension , renal insufficiency." PROBLEMS [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 15:00:00 +0100~ ppressives , history of gout , hypertension , renal i [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , renal 2018-01-01 15:00:00 +0100~ ves , history of gout , hypertension , renal insufficiency.>-- [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal insufficiency 2018-01-01 15:00:00 +0100~ f gout , hypertension , renal insufficiency.>-- =MEDICATIONS== [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents immunosuppressives 2018-01-01 15:00:00 +0100~ ated above with chronic immunosuppressives , history of gout , hype ==SIGNS= [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout 2018-01-01 15:00:00 +0100~ ic immunosuppressives , history of gout , hypertension , renal i =PROCEDURES== [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac transplant 15 2018-01-01 15:00:00 +0100~ -- wrote: >Uncertainty is when the text indicates some hedging about the concept: >"possible asthma" should have asthma as an IdentifiedAnnotation with the >uncertainty flag set to 1. >This is done by machine learning and it is not easy so it is not perfect. > >HistoryOf is for concepts that are explicitly in patient history, often >in a history section. >"history of lymphoma as a child" >lymphoma should have its history flag set to 1. >This is done by machine learning and it is not easy so it is not perfect. > >Confidence is a field that I don't believe gets set by any current >annotators, but in theory it is for methods that might use statistical >methods that output a score to set the score there. >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set >that score. > >DiscoveryTechnique is a way to flag which entities were annotated by >which annotator, since it's possible to have, e.g., multiple clinical >concept taggers. We use it occasionally internally >to separate gold standard entities from system-discovered entities (in a >machine learning evaluation) but I don't know if any standard pipeline >components set it. > >Tim > >____________ >From: Kumari,Puja >Sent: Friday, January 5, 2018 12:03 AM >To: dev@ctakes.apache.org >Subject: Re: Unable to understand the importance of attributes in >IdentifiedAnnotations [EXTERNAL] > >Hi, > > > >Thanks for the replies but I am still not able to understand the >significance of the attributes such as Uncertainty, HistoryOf, >Confidence, DiscoveryTechniques. > >Can anyone give some examples or any information which will help me to >understand these concepts in more depth? > > > >Thanks. > >Puja Kumari > > > >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan" > wrote: > > > >Try out this link - >"https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.prote >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconflu >ence-252Fdisplay-252FCT
Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]
Uncertainty is when the text indicates some hedging about the concept: "possible asthma" should have asthma as an IdentifiedAnnotation with the uncertainty flag set to 1. This is done by machine learning and it is not easy so it is not perfect. HistoryOf is for concepts that are explicitly in patient history, often in a history section. "history of lymphoma as a child" lymphoma should have its history flag set to 1. This is done by machine learning and it is not easy so it is not perfect. Confidence is a field that I don't believe gets set by any current annotators, but in theory it is for methods that might use statistical methods that output a score to set the score there. The cTAKES dictionary lookup either hits or doesn't, so it doesn't set that score. DiscoveryTechnique is a way to flag which entities were annotated by which annotator, since it's possible to have, e.g., multiple clinical concept taggers. We use it occasionally internally to separate gold standard entities from system-discovered entities (in a machine learning evaluation) but I don't know if any standard pipeline components set it. Tim From: Kumari,Puja Sent: Friday, January 5, 2018 12:03 AM To: dev@ctakes.apache.org Subject: Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL] Hi, Thanks for the replies but I am still not able to understand the significance of the attributes such as Uncertainty, HistoryOf, Confidence, DiscoveryTechniques. Can anyone give some examples or any information which will help me to understand these concepts in more depth? Thanks. Puja Kumari On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan" wrote: Try out this link - "https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C989437995db145fcbaa808d5536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506640417310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDMgNGoUbJRW1Hevp4-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUKJOD5orbrpKGro&e="; Regards, Gandhi -Original Message- From: Kumari,Puja [mailto:puja.kuma...@cerner.com] Sent: Thursday, January 04, 2018 3:11 PM To: dev@ctakes.apache.org Subject: Re: Unable to understand the importance of attributes in IdentifiedAnnotations Hi, Thanks for your reply Krishnareddy but the link given says “page not found”. Any other suggestions/links that you can share would be appreciable. Thanks Puja Kumari On 1/4/18, 2:51 PM, "Krishnareddy" wrote: Hi, You can find related information about these attributes in following link _*https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.apache.org-252Fconfluence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-252BAssertion-2A-5F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.com-257C738752ad0ee24b8bae6208d553547f25-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-257C636506544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX0SJTNfWMOG8-252B5fo-253D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=muQ5_Uh4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw-esE6uyr0W8KKtpA&e= Thank You Krishna Reddy On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote: > Hi, > I am working on IdentifiedAnnotations in apache cTAKES and I am not able to interpret the meaning of the following attributes under IdentifiedAnnotations: > 1.Uncertainty > 2.History > 3.Confidence > 4.Discovery Techniques > > What is the importance of these attributes? > How can we make use of these to make our work efficient? > Any suggestion / link to understand more would be helpful. > > > Thanks. > Puja Kumari > puja.kuma...@cerner.com<mailto:puja.kuma...@cerner.com> > > > > > > CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public infor