No, there is unfortunately no documentation and that code is a total mess. It 
in fact combines evaluations with training, and then we just set aside the best 
evaluated model and made that the default[1]. We really should have separate 
training code in the project and eval code can probably be outside (since it 
usually evolves a lot during development and will be messy). I can walk you 
through it if you're willing to put in some effort but unfortunately it's not a 
trivial thing.

You basically need to run this class: 
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/eval/AssertionEvaluation.java?view=markup

it reads the gold standard from a Knowtator scheme developed under the SHARP 
project [2]. But if you point it to already-generated xmi files (in 
train,dev,test sub-directories) it can be used for gold standards in other 
formats. Probably you would need to write a class to generate your own xmi that 
reads whatever format your data is in. You can look here:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/pipelines/GoldEntityAndAttributeReaderPipelineForSeedCorpus.java?view=markup

to see a few methods for generating xmi from different data formats.

Hopefully that is enough information for you to figure out whether it's worth 
pursuing!

Tim


[1] There is some training-specific code but I'm not sure that's kept up with 
the eval code
[2] It can also read a few other formats like i2b2 challenge and mipacq, in 
case your data looks like that.

________________________________________
From: Harish Kulkarni <harish.m.kulka...@gmail.com>
Sent: Saturday, January 6, 2018 1:32 AM
To: dev@ctakes.apache.org
Subject: Re: Unable to understand the importance of attributes in 
IdentifiedAnnotations [EXTERNAL]

Is there any documentation or tutorial on how to train ctakes for negation
history etc.
I have some data with which to train the system

Thanks
Harish

On Jan 5, 2018 18:19, "Abramowitsch, Peter" <pabramowit...@hearst.com>
wrote:

> Sorry for the very wordy contribution here ....   Following on Tim's
> answer,  I've found the historyOf mechanism to be very weak in its ability
> to detect more than just a few possible permutations, and one frequent
> issue is span where history Of modifies a series of concepts.
>
> ----------------------------------------
>
> Here's an example from real notes
>
> "The patient is a 57-year-old woman with a past medical history of OSA ,
> asthma , CAD status post CABG..."
>
> Using my CAS post processor on the output I get this.
>
>
> ================PROBLEMS====================
>   [,,History of ,SNOMEDCT_US:78275009/C0520679] Sleep Apnea, Obstructive
> OSA
> past medical history of OSA , asthma , CAD status po
>
>   [,,,SNOMEDCT_US:195967001/C0004096] Asthma asthma
> edical history of OSA , asthma , CAD status post CABG.>
>
>   [,,,SNOMEDCT_US:414024009/C1956346] Coronary Artery Disease CAD
> story of OSA , asthma , CAD status post CABG.>--
>
> =================MEDICATIONS==================
> ==============SIGNS=====================
> =================PROCEDURES==================
>   [,,History of ,SNOMEDCT_US:90205004/C0010055] Coronary Artery Bypass
> Surgery CABG
> sthma , CAD status post CABG.>--
>
>
> I print the history of, confidence, and polarity flags in the [,,, ]
> section before the SNOMED code of each identified annotation.   Notice
> that it found history of OSA but not Asthma or CAD. It did find History of
> again for the procedure CABG because of the word POST.
>
>
> Here's another example
>
> "Cardiac transplant 15 years ago as stated above with chronic
> immunosuppressives , history of gout , hypertension , renal insufficiency."
>
>
> ================PROBLEMS====================
>   [,,History of ,SNOMEDCT_US:90560007/C0018099] Gout gout  2018-01-01
> 15:00:00 +0100~
> ppressives , history of gout , hypertension , renal i
>
>   [,,,SNOMEDCT_US:28119000/C0020544] Renal hypertension hypertension ,
> renal  2018-01-01 15:00:00 +0100~
> ves , history of gout , hypertension , renal insufficiency.>--
>
>   [,,,SNOMEDCT_US:236423003/C1565489] Renal Insufficiency renal
> insufficiency  2018-01-01 15:00:00 +0100~
> f gout , hypertension , renal insufficiency.>--
>
> =================MEDICATIONS==================
>   [,,,SNOMEDCT_US:372823004/C0021081] Immunosuppressive Agents
> immunosuppressives  2018-01-01 15:00:00 +0100~
> ated above with chronic immunosuppressives , history of gout , hype
>
> ==============SIGNS=====================
>   [,,,SNOMEDCT_US:161451004/C0455492] H/O: gout history of gout
> 2018-01-01 15:00:00 +0100~
> ic immunosuppressives , history of gout , hypertension , renal i
>
> =================PROCEDURES==================
>   [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
> transplant 15 2018-01-01 15:00:00 +0100~
> --<Cardiac transplant 15 years ago as stated a
>
>
>
> Again, it picked up the History Of in the first clause where "history of"
> preceded its predicate, but not subsequent ones, or after a time
> expression indicating the past.
>
> I have a mind to work on this one day, but I think I'll be doing it in my
> CAS post processor rather than the annotator itself as the problem really
> involves a whole new solution that looks at the semantics of the whole
> sentence and not just "history of (x)"  For that we'd start looking at the
> conldep nodes, time annotations, and more.
>
> Peter
>
>
>
>
>
> On 1/5/18, 12:58 PM, "Miller, Timothy"
> <timothy.mil...@childrens.harvard.edu> wrote:
>
> >Uncertainty is when the text indicates some hedging about the concept:
> >"possible asthma" should have asthma as an IdentifiedAnnotation with the
> >uncertainty flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >HistoryOf is for concepts that are explicitly in patient history, often
> >in a history section.
> >"history of lymphoma as a child"
> >lymphoma should have its history flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >Confidence is a field that I don't believe gets set by any current
> >annotators, but in theory it is for methods that might use statistical
> >methods that output a score to set the score there.
> >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
> >that score.
> >
> >DiscoveryTechnique is a way to flag which entities were annotated by
> >which annotator, since it's possible to have, e.g., multiple clinical
> >concept taggers. We use it occasionally internally
> >to separate gold standard entities from system-discovered entities (in a
> >machine learning evaluation) but I don't know if any standard pipeline
> >components set it.
> >
> >Tim
> >
> >________________________________________
> >From: Kumari,Puja <puja.kuma...@cerner.com>
> >Sent: Friday, January 5, 2018 12:03 AM
> >To: dev@ctakes.apache.org
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations [EXTERNAL]
> >
> >Hi,
> >
> >
> >
> >Thanks for the replies but I am still not able to understand the
> >significance of the attributes such as Uncertainty, HistoryOf,
> >Confidence, DiscoveryTechniques.
> >
> >Can anyone give some examples or any information which will help me to
> >understand these concepts in more depth?
> >
> >
> >
> >Thanks.
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> ><gandhi.natara...@arisglobal.com> wrote:
> >
> >
> >
> >    Try out this link -
> >"https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prote
> >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconflu
> >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-26dat
> >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-
> 257C989437995db145fcbaa808d5
> >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-
> 257C636506640417
> >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM
> gNGoUbJRW1Hevp4-253D-26reserv
> >ed-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> Heup-IbsIg
> >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h&m=muQ5_Uh4Q-5Uui87e
> >9eAWy2afrJRgcg4FrOmy2VyFP8&s=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK
> JOD5orbrpKGro&e
> >="
> >
> >
> >
> >    Regards,
> >
> >    Gandhi
> >
> >
> >
> >
> >
> >    -----Original Message-----
> >
> >    From: Kumari,Puja [mailto:puja.kuma...@cerner.com]
> >
> >    Sent: Thursday, January 04, 2018 3:11 PM
> >
> >    To: dev@ctakes.apache.org
> >
> >    Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations
> >
> >
> >
> >    Hi,
> >
> >
> >
> >    Thanks for your reply Krishnareddy but the link given says ³page not
> >found². Any other suggestions/links that you can share would be
> >appreciable.
> >
> >
> >
> >    Thanks
> >
> >    Puja Kumari
> >
> >
> >
> >    On 1/4/18, 2:51 PM, "Krishnareddy" <krishnared...@kpmd.biz> wrote:
> >
> >
> >
> >        Hi,
> >
> >
> >
> >          You can find related information about these attributes in
> >following link
> >
> >
> >
> >
> >_*https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prot
> >ection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconfl
> >uence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-2A-5
> >F-26data-3D02-257C01-257CPuja.Kumari3-2540cerner.
> com-257C738752ad0ee24b8ba
> >e6208d553547f25-257Cfbc493a80d244454a815f4ca58
> e8c09d-257C0-257C0-257C63650
> >6544740520408-26sdata-3DTjBeskHtrWn8ycT16NaoDopB8bTX
> 0SJTNfWMOG8-252B5fo-25
> >3D-26reserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r
> >=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h&m=muQ5_U
> >h4Q-5Uui87e9eAWy2afrJRgcg4FrOmy2VyFP8&s=NmbWV7FVYRENHOhWtMCOu2UoOaw-
> esE6uy
> >r0W8KKtpA&e=
> >
> >
> >
> >
> >
> >        Thank You
> >
> >
> >
> >        Krishna Reddy
> >
> >
> >
> >
> >
> >        On Thursday 04 January 2018 12:31 PM, Kumari,Puja wrote:
> >
> >        > Hi,
> >
> >        > I am working on IdentifiedAnnotations in apache cTAKES and I am
> >not able to  interpret the meaning of the following attributes under
> >IdentifiedAnnotations:
> >
> >        > 1.Uncertainty
> >
> >        > 2.History
> >
> >        > 3.Confidence
> >
> >        > 4.Discovery Techniques
> >
> >        >
> >
> >        > What is the importance of these attributes?
> >
> >        > How can we make use of these to make our work efficient?
> >
> >        > Any suggestion / link to understand more would be helpful.
> >
> >        >
> >
> >        >
> >
> >        > Thanks.
> >
> >        > Puja Kumari
> >
> >        > puja.kuma...@cerner.com<mailto:puja.kuma...@cerner.com>
> >
> >        >
> >
> >        >
> >
> >        >
> >
> >        >
> >
> >        >
> >
> >        > CONFIDENTIALITY NOTICE This message and any included
> >attachments are from Cerner Corporation and are intended only for the
> >addressee. The information contained in this message is confidential and
> >may constitute inside or non-public information under international,
> >federal, or state securities laws. Unauthorized forwarding, printing,
> >copying, distribution, or use of such information is strictly prohibited
> >and may be unlawful. If you are not the addressee, please promptly delete
> >this message and notify the sender of the delivery error by e-mail or you
> >may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at
> >(+1) (816)221-1024.
> >
> >
> >
> >
> >
> >
> >
> >    This email and any files transmitted with it are confidential and
> >intended solely for the use of the individual or entity to whom they are
> >addressed. If you are not the named addressee you should not disseminate,
> >distribute or copy this e-mail. Please notify the sender or system
> >manager by email immediately if you have received this e-mail by mistake
> >and delete this e-mail from your system. If you are not the intended
> >recipient you are notified that disclosing, copying, distributing or
> >taking any action in reliance on the contents of this information is
> >strictly prohibited and against the law.
> >
> >
> >
> >
> >
>
>

Reply via email to