No, there is unfortunately no documentation and that code is a total mess. It in fact combines evaluations with training, and then we just set aside the best evaluated model and made that the default[1]. We really should have separate training code in the project and eval code can probably be outside (since it usually evolves a lot during development and will be messy). I can walk you through it if you're willing to put in some effort but unfortunately it's not a trivial thing.
You basically need to run this class: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup it reads the gold standard from a Knowtator scheme developed under the SHARP project [2]. But if you point it to already-generated xmi files (in train,dev,test sub-directories) it can be used for gold standards in other formats. Probably you would need to write a class to generate your own xmi that reads whatever format your data is in. You can look here: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup to see a few methods for generating xmi from different data formats. Hopefully that is enough information for you to figure out whether it's worth pursuing! Tim [1] There is some training-specific code but I'm not sure that's kept up with the eval code [2] It can also read a few other formats like i2b2 challenge and mipacq, in case your data looks like that. ________________________________________ From: Harish Kulkarni <harish.m.kulka...@gmail.com> Sent: Saturday, January 6, 2018 1:32 AM To: dev@ctakes.apache.org Subject: Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL] Is there any documentation or tutorial on how to train ctakes for negation history etc. I have some data with which to train the system Thanks Harish On Jan 5, 2018 18:19, "Abramowitsch, Peter" <pabramowit...@hearst.com> wrote: > Sorry for the very wordy contribution here .... Following on Tim's > answer, I've found the historyOf mechanism to be very weak in its ability > to detect more than just a few possible permutations, and one frequent > issue is span where history Of modifies a series of concepts. > > ---------------------------------------- > > Here's an example from real notes > > "The patient is a 57-year-old woman with a past medical history of OSA , > asthma , CAD status post CABG..." > > Using my CAS post processor on the output I get this. > > > ================PROBLEMS==================== > [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive > OSA > past medical history of OSA , asthma , CAD status po > > [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma > edical history of OSA , asthma , CAD status post CABG.> > > [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD > story of OSA , asthma , CAD status post CABG.>-- > > =================MEDICATIONS================== > ==============SIGNS===================== > =================PROCEDURES================== > [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass > Surgery CABG > sthma , CAD status post CABG.>-- > > > I print the history of, confidence, and polarity flags in the [,,, ] > section before the SNOMED code of each identified annotation. Notice > that it found history of OSA but not Asthma or CAD. It did find History of > again for the procedure CABG because of the word POST. > > > Here's another example > > "Cardiac transplant 15 years ago as stated above with chronic > immunosuppressives , history of gout , hypertension , renal insufficiency." > > > ================PROBLEMS==================== > [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout 2018-01-01 > 15:00:00 +0100~ > ppressives , history of gout , hypertension , renal i > > [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension , > renal 2018-01-01 15:00:00 +0100~ > ves , history of gout , hypertension , renal insufficiency.>-- > > [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal > insufficiency 2018-01-01 15:00:00 +0100~ > f gout , hypertension , renal insufficiency.>-- > > =================MEDICATIONS================== > [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents > immunosuppressives 2018-01-01 15:00:00 +0100~ > ated above with chronic immunosuppressives , history of gout , hype > > ==============SIGNS===================== > [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout > 2018-01-01 15:00:00 +0100~ > ic immunosuppressives , history of gout , hypertension , renal i > > =================PROCEDURES================== > [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac > transplant 15 2018-01-01 15:00:00 +0100~ > --<Cardiac transplant 15 years ago as stated a > > > > Again, it picked up the History Of in the first clause where "history of" > preceded its predicate, but not subsequent ones, or after a time > expression indicating the past. > > I have a mind to work on this one day, but I think I'll be doing it in my > CAS post processor rather than the annotator itself as the problem really > involves a whole new solution that looks at the semantics of the whole > sentence and not just "history of (x)" For that we'd start looking at the > conldep nodes, time annotations, and more. > > Peter > > > > > > On 1/5/18, 12:58 PM, "Miller, Timothy" > <timothy.mil...@childrens.harvard.edu> wrote: > > >Uncertainty is when the text indicates some hedging about the concept: > >"possible asthma" should have asthma as an IdentifiedAnnotation with the > >uncertainty flag set to 1. > >This is done by machine learning and it is not easy so it is not perfect. > > > >HistoryOf is for concepts that are explicitly in patient history, often > >in a history section. > >"history of lymphoma as a child" > >lymphoma should have its history flag set to 1. > >This is done by machine learning and it is not easy so it is not perfect. > > > >Confidence is a field that I don't believe gets set by any current > >annotators, but in theory it is for methods that might use statistical > >methods that output a score to set the score there. > >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set > >that score. > > > >DiscoveryTechnique is a way to flag which entities were annotated by > >which annotator, since it's possible to have, e.g., multiple clinical > >concept taggers. We use it occasionally internally > >to separate gold standard entities from system-discovered entities (in a > >machine learning evaluation) but I don't know if any standard pipeline > >components set it. > > > >Tim > > > >________________________________________ > >From: Kumari,Puja <puja.kuma...@cerner.com> > >Sent: Friday, January 5, 2018 12:03 AM > >To: dev@ctakes.apache.org > >Subject: Re: Unable to understand the importance of attributes in > >IdentifiedAnnotations [EXTERNAL] > > > >Hi, > > > > > > > >Thanks for the replies but I am still not able to understand the > >significance of the attributes such as Uncertainty, HistoryOf, > >Confidence, DiscoveryTechniques. > > > >Can anyone give some examples or any information which will help me to > >understand these concepts in more depth? > > > > > > > >Thanks. > > > >Puja Kumari > > > > > > > >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan" > ><gandhi.natara...@arisglobal.com> wrote: > > > > > > > > Try out this link - > >"https://urldefense.proofpoint.com/v2/url?u=https- > 3A__na01.safelinks.prote > >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki. > apache.org-252Fconflu > >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D- > 252BAssertion-26dat > >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com- > 257C989437995db145fcbaa808d5 > >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0- > 257C636506640417 > >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM > gNGoUbJRW1Hevp4-253D-26reserv > >ed-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > Heup-IbsIg > >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY > x6674h&m=muQ5_Uh4Q-5Uui87e > >9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK > JOD5orbrpKGro&e > >=" > > > > > > > > Regards, > > > > Gandhi > > > > > > > > > > > > -----Original Message----- > > > > From: Kumari,Puja [mailto:puja.kuma...@cerner.com] > > > > Sent: Thursday, January 04, 2018 3:11 PM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Unable to understand the importance of attributes in > >IdentifiedAnnotations > > > > > > > > Hi, > > > > > > > > Thanks for your reply Krishnareddy but the link given says ³page not > >found². Any other suggestions/links that you can share would be > >appreciable. > > > > > > > > Thanks > > > > Puja Kumari > > > > > > > > On 1/4/18, 2:51 PM, "Krishnareddy" <krishnared...@kpmd.biz> wrote: > > > > > > > > Hi, > > > > > > > > You can find related information about these attributes in > >following link > > > > > > > > > >_*https://urldefense.proofpoint.com/v2/url?u=https- > 3A__na01.safelinks.prot > >ection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki. > apache.org-252Fconfl > >uence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D- > 252BAssertion-2A-5 > >F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner. > com-257C738752ad0ee24b8ba > >e6208d553547f25-257Cfbc493a80d244454a815f4ca58 > e8c09d-257C0-257C0-257C63650 > >6544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX > 0SJTNfWMOG8-252B5fo-25 > >3D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_ > 3xhKwEW14JZMSdioCoppxeFU&r > >=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY > x6674h&m=muQ5_U > >h4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw- > esE6uy > >r0W8KKtpA&e= > > > > > > > > > > > > Thank You > > > > > > > > Krishna Reddy > > > > > > > > > > > > On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote: > > > > > Hi, > > > > > I am working on IdentifiedAnnotations in apache cTAKES and I am > >not able to interpret the meaning of the following attributes under > >IdentifiedAnnotations: > > > > > 1.Uncertainty > > > > > 2.History > > > > > 3.Confidence > > > > > 4.Discovery Techniques > > > > > > > > > > What is the importance of these attributes? > > > > > How can we make use of these to make our work efficient? > > > > > Any suggestion / link to understand more would be helpful. > > > > > > > > > > > > > > > Thanks. > > > > > Puja Kumari > > > > > puja.kuma...@cerner.com<mailto:puja.kuma...@cerner.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CONFIDENTIALITY NOTICE This message and any included > >attachments are from Cerner Corporation and are intended only for the > >addressee. The information contained in this message is confidential and > >may constitute inside or non-public information under international, > >federal, or state securities laws. Unauthorized forwarding, printing, > >copying, distribution, or use of such information is strictly prohibited > >and may be unlawful. If you are not the addressee, please promptly delete > >this message and notify the sender of the delivery error by e-mail or you > >may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at > >(+1) (816)221-1024. > > > > > > > > > > > > > > > > This email and any files transmitted with it are confidential and > >intended solely for the use of the individual or entity to whom they are > >addressed. If you are not the named addressee you should not disseminate, > >distribute or copy this e-mail. Please notify the sender or system > >manager by email immediately if you have received this e-mail by mistake > >and delete this e-mail from your system. If you are not the intended > >recipient you are notified that disclosing, copying, distributing or > >taking any action in reliance on the contents of this information is > >strictly prohibited and against the law. > > > > > > > > > > > >