RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi James, Thanks for the response. As you said its definitely not a showstopper. We encountered this measurement in the narratives we were testing and thought of fixing this. That’s the whole idea. Also as per the code, 'fslashCondition' added before 2nd token should avoid false positives is what I feel. Anyways I will let the experts like you to decide on this. Thanks for the consideration again. Regards, Gandhi -Original Message- From: James Masanz [mailto:masanz.ja...@gmail.com] Sent: Tuesday, October 03, 2017 10:05 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] FWIW, I started taking a look at the patch. (It's in code that I'm not that familiar with, so a quick glance isn't sufficient for me.) I did a search in UMLS for m2 in the terminologies commonly used by cTAKES to see if adding m2 could result in marking something as a measurement when it's not - and I did find many terms in the UMLS that contain m2. There are plenty of other measurement abbreviations that also appear within other terms, so it's not a showstopper - but is a consideration. I haven't tested the patch yet to see if the way the patch is implemented - checking for 2 tokens - avoids that issue. Not sure if I'll get a chance to look more this week. if you end up picking up looking at it Sean, at least you know what I've done. -- James On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Gandhi, > > Ctakes is a purely volunteer effort, so there are never any guarantees ... > If nobody looks at the value and unit jira and patch this week then I will > try to get to it asap. > > Thanks for letting us use your example note! > > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > Sent: Tuesday, October 03, 2017 12:21 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > Hi Sean, > > > > Will this JIRA issue - https://urldefense.proofpoint. > com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=EPRi2YznX0T5F4yYV0y2OmCxU0Q_ > Gx24B_omGRWF8kg&s=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes&e= be > looked up by someone as Tim mentioned? > > > > The paragraph we sent earlier can be in the example notes provided the > protocol number is masked/modified. > > > > Regards, > > Gandhi > > > > > > -Original Message- > > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > > Sent: Tuesday, October 03, 2017 7:27 PM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Gandhi, > > > > Thank you for asking. There is no action item for you concerning the > coreference output that you see. However, if you would like to help the > community understand how the module works (input and output), maybe you > could do something like run the pipeline on your original sentence, then > that sentence plus another (before), then that sentence plus another > (after) ... and see how the output changes with the input. If you take > screenshots or something then we could put them on the wiki. Also, would > you mind if the paragraph you sent became one of the example notes in > ctakes? That means that it would be redistributed with the code. > > > > Sean > > > > -Original Message- > > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Tuesday, October 03, 2017 4:26 AM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Tim/Sean, > > > > > > > > Is this an action item on us? If yes, Could someone give us some valid > inputs to test the same? Is someone else going to review this again? > > > > > > > > Regards, > > > > Gandhi > > > > > > > > > > > > -Original Message- > > > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > > > > Sent: Monday, October 02, 2017 8:06 PM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > > > > > My bad, I didn't read too closely and thought this was going to be a > coreference patch. I don't know this FSM code that wel
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Sean, Completely agree with you on this. Thanks for your support. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 9:56 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, Ctakes is a purely volunteer effort, so there are never any guarantees ... If nobody looks at the value and unit jira and patch this week then I will try to get to it asap. Thanks for letting us use your example note! Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 12:21 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Will this JIRA issue - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=EPRi2YznX0T5F4yYV0y2OmCxU0Q_Gx24B_omGRWF8kg&s=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes&e= be looked up by someone as Tim mentioned? The paragraph we sent earlier can be in the example notes provided the protocol number is masked/modified. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 7:27 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, Thank you for asking. There is no action item for you concerning the coreference output that you see. However, if you would like to help the community understand how the module works (input and output), maybe you could do something like run the pipeline on your original sentence, then that sentence plus another (before), then that sentence plus another (after) ... and see how the output changes with the input. If you take screenshots or something then we could put them on the wiki. Also, would you mind if the paragraph you sent became one of the example notes in ctakes? That means that it would be redistributed with the code. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I t
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Thanks for the update Sean. Please keep us posted so that we can test the same once your fix is ready. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 10:04 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I have one discovery pertaining to the coref items so far. Your first coreference (#1) is not appearing in the html because it consists only of a "generic" item: "this patient". Coreference: This patient , This patient , This patient , this patient , this patient , this patient , this patient This is a bug in the html writer that I will need to fix. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Excellent, thanks -Original Message- From: James Masanz [mailto:masanz.ja...@gmail.com] Sent: Tuesday, October 03, 2017 12:35 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] FWIW, I started taking a look at the patch. (It's in code that I'm not that familiar with, so a quick glance isn't sufficient for me.) I did a search in UMLS for m2 in the terminologies commonly used by cTAKES to see if adding m2 could result in marking something as a measurement when it's not - and I did find many terms in the UMLS that contain m2. There are plenty of other measurement abbreviations that also appear within other terms, so it's not a showstopper - but is a consideration. I haven't tested the patch yet to see if the way the patch is implemented - checking for 2 tokens - avoids that issue. Not sure if I'll get a chance to look more this week. if you end up picking up looking at it Sean, at least you know what I've done. -- James On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Gandhi, > > Ctakes is a purely volunteer effort, so there are never any guarantees ... > If nobody looks at the value and unit jira and patch this week then I will > try to get to it asap. > > Thanks for letting us use your example note! > > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > Sent: Tuesday, October 03, 2017 12:21 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > Hi Sean, > > > > Will this JIRA issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=g0Z49i4_khuoIF0p79Jh8zvJezinR7Dq_t3WlP_e2v4&s=nT_lkeizLaakNLeV829Pl1rOGdbGrldsns0j2o2MNOQ&e= > . > com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=EPRi2YznX0T5F4yYV0y2OmCxU0Q_ > Gx24B_omGRWF8kg&s=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes&e= be > looked up by someone as Tim mentioned? > > > > The paragraph we sent earlier can be in the example notes provided the > protocol number is masked/modified. > > > > Regards, > > Gandhi > > > > > > -Original Message- > > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > > Sent: Tuesday, October 03, 2017 7:27 PM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Gandhi, > > > > Thank you for asking. There is no action item for you concerning the > coreference output that you see. However, if you would like to help the > community understand how the module works (input and output), maybe you > could do something like run the pipeline on your original sentence, then > that sentence plus another (before), then that sentence plus another > (after) ... and see how the output changes with the input. If you take > screenshots or something then we could put them on the wiki. Also, would > you mind if the paragraph you sent became one of the example notes in > ctakes? That means that it would be redistributed with the code. > > > > Sean > > > > -Original Message- > > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Tuesday, October 03, 2017 4:26 AM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Tim/Sean, > > > > > > > > Is this an action item on us? If yes, Could someone give us some valid > inputs to test the same? Is someone else going to review this again? > > > > > > > > Regards, > > > > Gandhi > > > > > > > > > > > > -Original Message- > > > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > > > > Sent: Monday, October 02, 2017 8:06 PM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > > > > > My bad, I didn't read too closely and thought this was going to be a > coreference patch. I don't know this FSM code that well, so I am not an > expert. My biggest concern at a glance is that these additions help find > more true positives (as in y
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
FWIW, I started taking a look at the patch. (It's in code that I'm not that familiar with, so a quick glance isn't sufficient for me.) I did a search in UMLS for m2 in the terminologies commonly used by cTAKES to see if adding m2 could result in marking something as a measurement when it's not - and I did find many terms in the UMLS that contain m2. There are plenty of other measurement abbreviations that also appear within other terms, so it's not a showstopper - but is a consideration. I haven't tested the patch yet to see if the way the patch is implemented - checking for 2 tokens - avoids that issue. Not sure if I'll get a chance to look more this week. if you end up picking up looking at it Sean, at least you know what I've done. -- James On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Gandhi, > > Ctakes is a purely volunteer effort, so there are never any guarantees ... > If nobody looks at the value and unit jira and patch this week then I will > try to get to it asap. > > Thanks for letting us use your example note! > > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > Sent: Tuesday, October 03, 2017 12:21 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > Hi Sean, > > > > Will this JIRA issue - https://urldefense.proofpoint. > com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=EPRi2YznX0T5F4yYV0y2OmCxU0Q_ > Gx24B_omGRWF8kg&s=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes&e= be > looked up by someone as Tim mentioned? > > > > The paragraph we sent earlier can be in the example notes provided the > protocol number is masked/modified. > > > > Regards, > > Gandhi > > > > > > -Original Message- > > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > > Sent: Tuesday, October 03, 2017 7:27 PM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Gandhi, > > > > Thank you for asking. There is no action item for you concerning the > coreference output that you see. However, if you would like to help the > community understand how the module works (input and output), maybe you > could do something like run the pipeline on your original sentence, then > that sentence plus another (before), then that sentence plus another > (after) ... and see how the output changes with the input. If you take > screenshots or something then we could put them on the wiki. Also, would > you mind if the paragraph you sent became one of the example notes in > ctakes? That means that it would be redistributed with the code. > > > > Sean > > > > -Original Message- > > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Tuesday, October 03, 2017 4:26 AM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > Hi Tim/Sean, > > > > > > > > Is this an action item on us? If yes, Could someone give us some valid > inputs to test the same? Is someone else going to review this again? > > > > > > > > Regards, > > > > Gandhi > > > > > > > > > > > > -Original Message- > > > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] > > > > Sent: Monday, October 02, 2017 8:06 PM > > > > To: dev@ctakes.apache.org > > > > Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > [SUSPICIOUS] > > > > > > > > My bad, I didn't read too closely and thought this was going to be a > coreference patch. I don't know this FSM code that well, so I am not an > expert. My biggest concern at a glance is that these additions help find > more true positives (as in your examples), can we verify that they won't > create false positives? > > > > Tim > > > > > > > > > > > > On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > > > > > Hi Sean, > > > > > > > > > > Thanks again for the response. I guess its mistake from my side that I > > > > > dint send the complete text. Did you mean that with the text I sent, > > > > > the co-referen
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, I have one discovery pertaining to the coref items so far. Your first coreference (#1) is not appearing in the html because it consists only of a "generic" item: "this patient". Coreference: This patient , This patient , This patient , this patient , this patient , this patient , this patient This is a bug in the html writer that I will need to fix. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, Ctakes is a purely volunteer effort, so there are never any guarantees ... If nobody looks at the value and unit jira and patch this week then I will try to get to it asap. Thanks for letting us use your example note! Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 12:21 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Will this JIRA issue - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=EPRi2YznX0T5F4yYV0y2OmCxU0Q_Gx24B_omGRWF8kg&s=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes&e= be looked up by someone as Tim mentioned? The paragraph we sent earlier can be in the example notes provided the protocol number is masked/modified. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 7:27 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, Thank you for asking. There is no action item for you concerning the coreference output that you see. However, if you would like to help the community understand how the module works (input and output), maybe you could do something like run the pipeline on your original sentence, then that sentence plus another (before), then that sentence plus another (after) ... and see how the output changes with the input. If you take screenshots or something then we could put them on the wiki. Also, would you mind if the paragraph you sent became one of the example notes in ctakes? That means that it would be redistributed with the code. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Sean, Will this JIRA issue - https://issues.apache.org/jira/browse/CTAKES-459 be looked up by someone as Tim mentioned? The paragraph we sent earlier can be in the example notes provided the protocol number is masked/modified. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 7:27 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, Thank you for asking. There is no action item for you concerning the coreference output that you see. However, if you would like to help the community understand how the module works (input and output), maybe you could do something like run the pipeline on your original sentence, then that sentence plus another (before), then that sentence plus another (after) ... and see how the output changes with the input. If you take screenshots or something then we could put them on the wiki. Also, would you mind if the paragraph you sent became one of the example notes in ctakes? That means that it would be redistributed with the code. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
Thanks Tim! I was looking for that one but couldn't find it. -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, October 03, 2017 10:03 AM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] Here's the most recent publication, which describes the system in ctakes 4.0 and later: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.com_science_article_pii_S1532046417300850&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=L05lBYR93doAn-IsnZW2HMb7Ev0Y_82_0CpE3FYzpEA&s=GohiPyZbSEWfBjnOtC6x3UNnzv-fOBTnPFaIBUnVjm8&e= Tim On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote: > > > > With the changes in Input, the co-reference between all the > > entities should still be preserved right? > No. One of the experts can better explain this, but the coreference > module works with "best match" chains. With one sentence of text, > term (Markable) A may have a best match with term B. As soon as you > add more text, you introduce the possibility that term A will have a > better best match with C and/or D, and the previous match to B will > be deemed less accurate and dropped. > In your case the coreference A - B seems to be lost in favor of one > using internal term A', and that is a little strange. It could be > that overlapping markables are being discarded? I will try to look > into this really quickly. > > You can look at some publications on coref if you search the > web. The one that probably best applies to the current coref module > (Tim, Dima, is this true?) is > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a > nthology_W12- > 2D2409&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl > HbNcMy1vdWWTE&s=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik&e= > > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Tuesday, October 03, 2017 4:18 AM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > Hi Sean, I still have some doubts on this. If I run the piper file > with the complete text I sent earlier, I could see only superscript - > 4 for Thalomid and the co-reference of this to "treatment of > hepatocellular carcinoma" is still lost. Also I don’t see any > superscript with number-1 too. With the changes in Input, the co- > reference between all the entities should still be preserved right? > Do we have any more info or doc on this co-reference module to > understand its complexity better? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Monday, October 02, 2017 8:36 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > Hi Tim, > > The coreference question (just a question) was for a different item > altogether. Sorry for any confusion. The reason that I CC:d you ... > > From Gandhi: > > > > Interestingly even I was able to generate [Sean's coref output] > > using piper GUI by having only that single line - " The patient > > started study treatment of Thalomid 200mg (days 1-21), and > > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the > > treatment of hepatocellular carcinoma. " in the input file. > > But when I change the input file content with the following > > lines: [Full paragraph (below), single-sentence in middle] The > > co-reference superscript is lost by then. > Sean's answer: > > > > Ctakes is a system with many moving parts. Things that precede or > > follow your original example sentence will change the evaluation of > > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > > > > Tim can correct me here, but maybe the coreference module ranked > > the links between "thalomid" as much higher than the rank between > > "study treatment of thalomid 200mg" and "the treatme
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
This is very informative. Thank you Tim Alex On Oct 3, 2017 10:06, "Miller, Timothy" < timothy.mil...@childrens.harvard.edu> wrote: > Here's the most recent publication, which describes the system in > ctakes 4.0 and later: > http://www.sciencedirect.com/science/article/pii/S1532046417300850 > Tim > > On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote: > > > > > > With the changes in Input, the co-reference between all the > > > entities should still be preserved right? > > No. One of the experts can better explain this, but the coreference > > module works with "best match" chains. With one sentence of text, > > term (Markable) A may have a best match with term B. As soon as you > > add more text, you introduce the possibility that term A will have a > > better best match with C and/or D, and the previous match to B will > > be deemed less accurate and dropped. > > In your case the coreference A - B seems to be lost in favor of one > > using internal term A', and that is a little strange. It could be > > that overlapping markables are being discarded? I will try to look > > into this really quickly. > > > > You can look at some publications on coref if you search the > > web. The one that probably best applies to the current coref module > > (Tim, Dima, is this true?) is > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a > > nthology_W12- > > 2D2409&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > > IbsIg9Q1TPOylpP9FE4GTK- > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl > > HbNcMy1vdWWTE&s=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik&e= > > > > Sean > > > > -Original Message- > > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > > > Sent: Tuesday, October 03, 2017 4:18 AM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates > > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > > > Hi Sean, I still have some doubts on this. If I run the piper file > > with the complete text I sent earlier, I could see only superscript - > > 4 for Thalomid and the co-reference of this to "treatment of > > hepatocellular carcinoma" is still lost. Also I don’t see any > > superscript with number-1 too. With the changes in Input, the co- > > reference between all the entities should still be preserved right? > > Do we have any more info or doc on this co-reference module to > > understand its complexity better? > > > > Regards, > > Gandhi > > > > > > -Original Message- > > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > > Sent: Monday, October 02, 2017 8:36 PM > > To: dev@ctakes.apache.org > > Subject: RE: Enabling drugner pipeline and identifying dates > > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > > > Hi Tim, > > > > The coreference question (just a question) was for a different item > > altogether. Sorry for any confusion. The reason that I CC:d you ... > > > > From Gandhi: > > > > > > Interestingly even I was able to generate [Sean's coref output] > > > using piper GUI by having only that single line - " The patient > > > started study treatment of Thalomid 200mg (days 1-21), and > > > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the > > > treatment of hepatocellular carcinoma. " in the input file. > > > But when I change the input file content with the following > > > lines: [Full paragraph (below), single-sentence in middle] The > > > co-reference superscript is lost by then. > > Sean's answer: > > > > > > Ctakes is a system with many moving parts. Things that precede or > > > follow your original example sentence will change the evaluation of > > > that sentence. > > With the pipeline you are using and the full note, you should see a > > number (mine is 4) next to the first "thalomid" in the original > > example sentence. If you click that number you should see (to the > > right) 4 instances of "thalomid". > > > > > > Tim can correct me here, but maybe the coreference module ranked > > > the links between "thalomid" as much higher than the rank between > > > "study treatment of thalomid 200mg" and "the treatment of > > > hepatocellular carcinoma" and discarded the encapsulating treatment > > > texts from markables? It is prob
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
Here's the most recent publication, which describes the system in ctakes 4.0 and later: http://www.sciencedirect.com/science/article/pii/S1532046417300850 Tim On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote: > > > > With the changes in Input, the co-reference between all the > > entities should still be preserved right? > No. One of the experts can better explain this, but the coreference > module works with "best match" chains. With one sentence of text, > term (Markable) A may have a best match with term B. As soon as you > add more text, you introduce the possibility that term A will have a > better best match with C and/or D, and the previous match to B will > be deemed less accurate and dropped. > In your case the coreference A - B seems to be lost in favor of one > using internal term A', and that is a little strange. It could be > that overlapping markables are being discarded? I will try to look > into this really quickly. > > You can look at some publications on coref if you search the > web. The one that probably best applies to the current coref module > (Tim, Dima, is this true?) is > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a > nthology_W12- > 2D2409&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl > HbNcMy1vdWWTE&s=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik&e= > > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Tuesday, October 03, 2017 4:18 AM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > Hi Sean, I still have some doubts on this. If I run the piper file > with the complete text I sent earlier, I could see only superscript - > 4 for Thalomid and the co-reference of this to "treatment of > hepatocellular carcinoma" is still lost. Also I don’t see any > superscript with number-1 too. With the changes in Input, the co- > reference between all the entities should still be preserved right? > Do we have any more info or doc on this co-reference module to > understand its complexity better? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Monday, October 02, 2017 8:36 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] > > Hi Tim, > > The coreference question (just a question) was for a different item > altogether. Sorry for any confusion. The reason that I CC:d you ... > > From Gandhi: > > > > Interestingly even I was able to generate [Sean's coref output] > > using piper GUI by having only that single line - " The patient > > started study treatment of Thalomid 200mg (days 1-21), and > > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the > > treatment of hepatocellular carcinoma. " in the input file. > > But when I change the input file content with the following > > lines: [Full paragraph (below), single-sentence in middle] The > > co-reference superscript is lost by then. > Sean's answer: > > > > Ctakes is a system with many moving parts. Things that precede or > > follow your original example sentence will change the evaluation of > > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > > > > Tim can correct me here, but maybe the coreference module ranked > > the links between "thalomid" as much higher than the rank between > > "study treatment of thalomid 200mg" and "the treatment of > > hepatocellular carcinoma" and discarded the encapsulating treatment > > texts from markables? It is probably more complex than that. > Sean > > "This patient is participating in a Non-IND study; Protocol CG- > 000424: "Phase I/II of Thalidomide and Epirubicin in Patients with > Unresectable or Metastatic Hepatocellular Carcinoma".Information has > been received from the investigator regarding an 82 year-old male > patient who had gastrointestinal bleeding while on Thalomid, > Epirubicin, and Coumadin. He had a past medical history of > diverticulosis in 03/02 and a right atrial clot from intraventricular > catheter (IVC) fo
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, Thank you for asking. There is no action item for you concerning the coreference output that you see. However, if you would like to help the community understand how the module works (input and output), maybe you could do something like run the pipeline on your original sentence, then that sentence plus another (before), then that sentence plus another (after) ... and see how the output changes with the input. If you take screenshots or something then we could put them on the wiki. Also, would you mind if the paragraph you sent became one of the example notes in ctakes? That means that it would be redistributed with the code. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:26 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - > https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ&s=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc&e= > > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > W
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> With the changes in Input, the co-reference between all the entities should > still be preserved right? No. One of the experts can better explain this, but the coreference module works with "best match" chains. With one sentence of text, term (Markable) A may have a best match with term B. As soon as you add more text, you introduce the possibility that term A will have a better best match with C and/or D, and the previous match to B will be deemed less accurate and dropped. In your case the coreference A - B seems to be lost in favor of one using internal term A', and that is a little strange. It could be that overlapping markables are being discarded? I will try to look into this really quickly. You can look at some publications on coref if you search the web. The one that probably best applies to the current coref module (Tim, Dima, is this true?) is https://www.aclweb.org/anthology/W12-2409 Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, October 03, 2017 4:18 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] Hi Sean, I still have some doubts on this. If I run the piper file with the complete text I sent earlier, I could see only superscript - 4 for Thalomid and the co-reference of this to "treatment of hepatocellular carcinoma" is still lost. Also I don’t see any superscript with number-1 too. With the changes in Input, the co-reference between all the entities should still be preserved right? Do we have any more info or doc on this co-reference module to understand its complexity better? Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:36 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] Hi Tim, The coreference question (just a question) was for a different item altogether. Sorry for any confusion. The reason that I CC:d you ... From Gandhi: > Interestingly even I was able to generate [Sean's coref output] using piper > GUI by having only that single line - " The patient started study treatment > of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) > on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input > file. >But when I change the input file content with the following lines: [Full >paragraph (below), single-sentence in middle] The co-reference superscript is >lost by then. Sean's answer: > Ctakes is a system with many moving parts. Things that precede or follow > your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". >Tim can correct me here, but maybe the coreference module ranked the links >between "thalomid" as much higher than the rank between "study treatment of >thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded >the encapsulating treatment texts from markables? It is probably more complex >than that. Sean "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-r
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Tim/Sean, Is this an action item on us? If yes, Could someone give us some valid inputs to test the same? Is someone else going to review this again? Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:06 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - https://urldefen > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU&e= > Creating patches: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor- > 2Dpre-2Dcommit-2Dreviews- > 2D298977458.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo&e= > Attaching files: https://urldefense.proofpoint.com/v2/url?u=https-3 > A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand- > 2Dscreenshots-2Dto-2Dissues- > 2D765593805.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ub
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
Hi Sean, I still have some doubts on this. If I run the piper file with the complete text I sent earlier, I could see only superscript - 4 for Thalomid and the co-reference of this to "treatment of hepatocellular carcinoma" is still lost. Also I don’t see any superscript with number-1 too. With the changes in Input, the co-reference between all the entities should still be preserved right? Do we have any more info or doc on this co-reference module to understand its complexity better? Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, October 02, 2017 8:36 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] Hi Tim, The coreference question (just a question) was for a different item altogether. Sorry for any confusion. The reason that I CC:d you ... From Gandhi: > Interestingly even I was able to generate [Sean's coref output] using piper > GUI by having only that single line - " The patient started study treatment > of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) > on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input > file. >But when I change the input file content with the following lines: [Full >paragraph (below), single-sentence in middle] The co-reference superscript is >lost by then. Sean's answer: > Ctakes is a system with many moving parts. Things that precede or follow > your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". >Tim can correct me here, but maybe the coreference module ranked the links >between "thalomid" as much higher than the rank between "study treatment of >thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded >the encapsulating treatment texts from markables? It is probably more complex >than that. Sean "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-ray PA and lateral done that showed no evidence of acute pneumonia or congestive heart failure. On 07/14/02, he underwent an ultrasound which was negative for deep vein thrombosis. This patient did not take Thalomid on the day of his admittance to the hospital, but resumed treatment shortly after with no return of symptoms. On 07/15/02, he was discharged in stable condition. There have been no further reports of bleeding at this time. Thedoctor has assessed the hematochezia as related to Coumadin treatment and previously diagnosed diverticulosis, and not to protocol therapy with Thalomid and Epirubicin.Additional information received from the investigator on 27Aug02 reveals that this male patient began on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin. His post cycle two computed tomography scans revealed increase in size of liver lesion with development of multiple new satellite nodules. On 29Jul02, the investigator removed this patient from protocol for progressive disease and recommended hospice care. After seeking a second opinion from two other institutions, this patient was admitted to hospice on 05Aug02. On 20Aug02, the investigator noted that this patie
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
Hi Tim, The coreference question (just a question) was for a different item altogether. Sorry for any confusion. The reason that I CC:d you ... From Gandhi: > Interestingly even I was able to generate [Sean's coref output] using piper > GUI by having only that single line - " The patient started study treatment > of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) > on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input > file. >But when I change the input file content with the following lines: [Full >paragraph (below), single-sentence in middle] The co-reference superscript is >lost by then. Sean's answer: > Ctakes is a system with many moving parts. Things that precede or follow > your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". >Tim can correct me here, but maybe the coreference module ranked the links >between "thalomid" as much higher than the rank between "study treatment of >thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded >the encapsulating treatment texts from markables? It is probably more complex >than that. Sean "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-ray PA and lateral done that showed no evidence of acute pneumonia or congestive heart failure. On 07/14/02, he underwent an ultrasound which was negative for deep vein thrombosis. This patient did not take Thalomid on the day of his admittance to the hospital, but resumed treatment shortly after with no return of symptoms. On 07/15/02, he was discharged in stable condition. There have been no further reports of bleeding at this time. Thedoctor has assessed the hematochezia as related to Coumadin treatment and previously diagnosed diverticulosis, and not to protocol therapy with Thalomid and Epirubicin.Additional information received from the investigator on 27Aug02 reveals that this male patient began on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin. His post cycle two computed tomography scans revealed increase in size of liver lesion with development of multiple new satellite nodules. On 29Jul02, the investigator removed this patient from protocol for progressive disease and recommended hospice care. After seeking a second opinion from two other institutions, this patient was admitted to hospice on 05Aug02. On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair. On 25Aug02, this patient died due to disease progression. The investigator assessed the death as not related to study treatment and expected" -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, October 02, 2017 10:36 AM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that t
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
My bad, I didn't read too closely and thought this was going to be a coreference patch. I don't know this FSM code that well, so I am not an expert. My biggest concern at a glance is that these additions help find more true positives (as in your examples), can we verify that they won't create false positives? Tim On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote: > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that > I dint send the complete text. Did you mean that with the text I > sent, the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - https://urldefen > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup- > IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU&e= > Creating patches: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor- > 2Dpre-2Dcommit-2Dreviews- > 2D298977458.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo&e= > Attaching files: https://urldefense.proofpoint.com/v2/url?u=https-3 > A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand- > 2Dscreenshots-2Dto-2Dissues- > 2D765593805.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g > nqCIxz6hOzUUQ&s=eO_HZCkkeOg8jF3CMYnMxttXRHSM16qdwPl5nTW48zQ&e= > > I don't know if you have a jira account and permissions for the > ctakes project. An administrator may need to set that up for you. > > Thanks, > Sean > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > Sent: Thursday, September 28, 2017 4:09
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
Thanks Sean and Tim. Will ping back if I don’t hear from you guys in a week's time. Thanks for all the response. Regards, Gandhi -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, September 30, 2017 12:45 AM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] It is a very busy time for me but this is on my todo list. Don't be afraid to ping in a week or so if you don't hear anything. Tim On Fri, 2017-09-29 at 14:04 +, Finan, Sean wrote: > Hi Gandhi, > > > > Did you mean that with the text I sent, the co-reference > > superscript-1 will be lost? > Yes. Well, to be more clear, the coreference that was resolved as #1 > in your original sentence alone will be lost. However, there are > eight or none coreference chains discovered in your full paragraph, > and one of those will have superscript 1s. > > > > > Could someone have a look and know your thoughts please? > Thank you for creating the jira and the patch. I am sure that > somebody will take a look. > > Thanks, > Sean > > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Friday, September 29, 2017 2:25 AM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that I > dint send the complete text. Did you mean that with the text I sent, > the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - https://urldefen > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67Gv > lGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVho > hpozRybEEpwNUg&s=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message----- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjA > DsUYBjMaVhohpozRybEEpwNUg&s=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp > 2Q&e= > Creating patches: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor- > 2Dpre-2Dcommit-2Dreviews- > 2D29
Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
It is a very busy time for me but this is on my todo list. Don't be afraid to ping in a week or so if you don't hear anything. Tim On Fri, 2017-09-29 at 14:04 +, Finan, Sean wrote: > Hi Gandhi, > > > > Did you mean that with the text I sent, the co-reference > > superscript-1 will be lost? > Yes. Well, to be more clear, the coreference that was resolved as #1 > in your original sentence alone will be lost. However, there are > eight or none coreference chains discovered in your full paragraph, > and one of those will have superscript 1s. > > > > > Could someone have a look and know your thoughts please? > Thank you for creating the jira and the patch. I am sure that > somebody will take a look. > > Thanks, > Sean > > > -Original Message- > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > > Sent: Friday, September 29, 2017 2:25 AM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Sean, > > Thanks again for the response. I guess its mistake from my side that > I dint send the complete text. Did you mean that with the text I > sent, the co-reference superscript-1 will be lost? > > Also as per your advice, We have created an issue - https://urldefen > se.proofpoint.com/v2/url?u=https- > 3A__issues.apache.org_jira_browse_CTAKES- > 2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67Gv > lGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVho > hpozRybEEpwNUg&s=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q&e= for > measurement FSM changes and attached the modified file changes. Could > someone have a look and know your thoughts please? > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Thursday, September 28, 2017 8:21 PM > To: dev@ctakes.apache.org > Cc: Miller, Timothy > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] [SUSPICIOUS] > > Hi Gandhi, > > I don't recall you sending me that entire snippet of text. I think > that I only had your single example sentence. > You have discovered one of the quirks of software: "change the data, > change the result." > Ctakes is a system with many moving parts. Things that precede or > follow your original example sentence will change the evaluation of > that sentence. > With the pipeline you are using and the full note, you should see a > number (mine is 4) next to the first "thalomid" in the original > example sentence. If you click that number you should see (to the > right) 4 instances of "thalomid". > Tim can correct me here, but maybe the coreference module ranked the > links between "thalomid" as much higher than the rank between "study > treatment of thalomid 200mg" and "the treatment of hepatocellular > carcinoma" and discarded the encapsulating treatment texts from > markables? It is probably more complex than that. > > > > > we have also made some code changes in MeasurementFSM.java to > > identify certain measurements like '20 mg/m2' which was not > > identified out of the box. Should we send the code changes to you > > so that you can consider the same to be productized ? Please > > advise." > I don't know if you've noticed the recent emails on the dev list > involving Alexandru Zbarcea. Alex has been creating or commenting on > Jira items and attaching code for fixes and enhancements. This is a > widely used process and is fairly easy to follow. I think that the > following links are relevant: > Working with issues: https://urldefense.proofpoint.com/v2/url?u=http > s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith- > 2Dissues- > 2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjA > DsUYBjMaVhohpozRybEEpwNUg&s=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp > 2Q&e= > Creating patches: https://urldefense.proofpoint.com/v2/url?u=https- > 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor- > 2Dpre-2Dcommit-2Dreviews- > 2D298977458.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > FU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjA > DsUYBjMaVhohpozRybEEpwNUg&s=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1G > yc&e= > Attaching files: https://urldefense.proofpoint.com/v2/url?u=https-3 > A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Da
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, > Did you mean that with the text I sent, the co-reference superscript-1 will > be lost? Yes. Well, to be more clear, the coreference that was resolved as #1 in your original sentence alone will be lost. However, there are eight or none coreference chains discovered in your full paragraph, and one of those will have superscript 1s. > Could someone have a look and know your thoughts please? Thank you for creating the jira and the patch. I am sure that somebody will take a look. Thanks, Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Friday, September 29, 2017 2:25 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks again for the response. I guess its mistake from my side that I dint send the complete text. Did you mean that with the text I sent, the co-reference superscript-1 will be lost? Also as per your advice, We have created an issue - https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg&s=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q&e= for measurement FSM changes and attached the modified file changes. Could someone have a look and know your thoughts please? Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 28, 2017 8:21 PM To: dev@ctakes.apache.org Cc: Miller, Timothy Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I don't recall you sending me that entire snippet of text. I think that I only had your single example sentence. You have discovered one of the quirks of software: "change the data, change the result." Ctakes is a system with many moving parts. Things that precede or follow your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". Tim can correct me here, but maybe the coreference module ranked the links between "thalomid" as much higher than the rank between "study treatment of thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded the encapsulating treatment texts from markables? It is probably more complex than that. > we have also made some code changes in MeasurementFSM.java to identify > certain measurements like '20 mg/m2' which was not identified out of the box. > Should we send the code changes to you so that you can consider the same to > be productized ? Please advise." I don't know if you've noticed the recent emails on the dev list involving Alexandru Zbarcea. Alex has been creating or commenting on Jira items and attaching code for fixes and enhancements. This is a widely used process and is fairly easy to follow. I think that the following links are relevant: Working with issues: https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-2Dissues-2D861257307.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg&s=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp2Q&e= Creating patches: https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-2Dpre-2Dcommit-2Dreviews-2D298977458.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg&s=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1Gyc&e= Attaching files: https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-2Dscreenshots-2Dto-2Dissues-2D765593805.html&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg&s=WT5NtwXSeAbZOb6iAojfglU5OKMnCTmyyo1HUUggCrE&e= I don't know if you have a jira account and permissions for the ctakes project. An administrator may need to set that up for you. Thanks, Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Thursday, September 28, 2017 4:09 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks for the respo
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Sean, Thanks again for the response. I guess its mistake from my side that I dint send the complete text. Did you mean that with the text I sent, the co-reference superscript-1 will be lost? Also as per your advice, We have created an issue - https://issues.apache.org/jira/browse/CTAKES-459 for measurement FSM changes and attached the modified file changes. Could someone have a look and know your thoughts please? Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 28, 2017 8:21 PM To: dev@ctakes.apache.org Cc: Miller, Timothy Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I don't recall you sending me that entire snippet of text. I think that I only had your single example sentence. You have discovered one of the quirks of software: "change the data, change the result." Ctakes is a system with many moving parts. Things that precede or follow your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". Tim can correct me here, but maybe the coreference module ranked the links between "thalomid" as much higher than the rank between "study treatment of thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded the encapsulating treatment texts from markables? It is probably more complex than that. > we have also made some code changes in MeasurementFSM.java to identify > certain measurements like '20 mg/m2' which was not identified out of the box. > Should we send the code changes to you so that you can consider the same to > be productized ? Please advise." I don't know if you've noticed the recent emails on the dev list involving Alexandru Zbarcea. Alex has been creating or commenting on Jira items and attaching code for fixes and enhancements. This is a widely used process and is fairly easy to follow. I think that the following links are relevant: Working with issues: https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html Creating patches: https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html Attaching files: https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html I don't know if you have a jira account and permissions for the ctakes project. An administrator may need to set that up for you. Thanks, Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Thursday, September 28, 2017 4:09 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks for the response. I was able to see the co-reference superscript using the html file that you sent. Interestingly even I was able to generate the sample HTML using piper GUI by having only that single line - " The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input file. But when I change the input file content with the following lines: "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, I don't recall you sending me that entire snippet of text. I think that I only had your single example sentence. You have discovered one of the quirks of software: "change the data, change the result." Ctakes is a system with many moving parts. Things that precede or follow your original example sentence will change the evaluation of that sentence. With the pipeline you are using and the full note, you should see a number (mine is 4) next to the first "thalomid" in the original example sentence. If you click that number you should see (to the right) 4 instances of "thalomid". Tim can correct me here, but maybe the coreference module ranked the links between "thalomid" as much higher than the rank between "study treatment of thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded the encapsulating treatment texts from markables? It is probably more complex than that. > we have also made some code changes in MeasurementFSM.java to identify > certain measurements like '20 mg/m2' which was not identified out of the box. > Should we send the code changes to you so that you can consider the same to > be productized ? Please advise." I don't know if you've noticed the recent emails on the dev list involving Alexandru Zbarcea. Alex has been creating or commenting on Jira items and attaching code for fixes and enhancements. This is a widely used process and is fairly easy to follow. I think that the following links are relevant: Working with issues: https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html Creating patches: https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html Attaching files: https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html I don't know if you have a jira account and permissions for the ctakes project. An administrator may need to set that up for you. Thanks, Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Thursday, September 28, 2017 4:09 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks for the response. I was able to see the co-reference superscript using the html file that you sent. Interestingly even I was able to generate the sample HTML using piper GUI by having only that single line - " The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input file. But when I change the input file content with the following lines: "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-ray PA and lateral done that showed no evidence of acute pneumonia or congestive heart failure. On 07/14/02, he underwent an ultrasound which was negative for deep vein thrombosis. This patient did not take Thalomid on the day of his admittance to the hospital, but resumed treatment shortly after with no return of symptoms. On 07/15/02, he was discharged in stable condition. There have been no further reports of bleeding at this time. Thedoctor has assessed the hematochezia as related to Coumadin treatment and previously diagnosed diverticulosis, and not to protocol therapy with Thalomid and Epirubicin.Additional informatio
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Sean, Thanks for the response. I was able to see the co-reference superscript using the html file that you sent. Interestingly even I was able to generate the sample HTML using piper GUI by having only that single line - " The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input file. But when I change the input file content with the following lines: "This patient is participating in a Non-IND study; Protocol CG-000424: "Phase I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic Hepatocellular Carcinoma".Information has been received from the investigator regarding an 82 year-old male patient who had gastrointestinal bleeding while on Thalomid, Epirubicin, and Coumadin. He had a past medical history of diverticulosis in 03/02 and a right atrial clot from intraventricular catheter (IVC) for which he was started on Coumadin. During the hospitalization for a right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was referred to an oncologist. The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. He was concomitantly receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient presented to the emergency room with the chief complaint of hematochezia. He reported noticing bright red blood and small clots mixed in with his stool. On 07/13/02, he was admitted due to gastrointestinal bleed. The physician ordered 2 large bore intravenous lines and planned to transfuse for hematocrit less than 30%. Due to the INR (international normalized ratio) level of 3.0, Coumadin was held. He was also noted to have bilateral lower extremity edema with dyspnea on exertion. On 07/13/02, he had a chest X-ray PA and lateral done that showed no evidence of acute pneumonia or congestive heart failure. On 07/14/02, he underwent an ultrasound which was negative for deep vein thrombosis. This patient did not take Thalomid on the day of his admittance to the hospital, but resumed treatment shortly after with no return of symptoms. On 07/15/02, he was discharged in stable condition. There have been no further reports of bleeding at this time. Thedoctor has assessed the hematochezia as related to Coumadin treatment and previously diagnosed diverticulosis, and not to protocol therapy with Thalomid and Epirubicin.Additional information received from the investigator on 27Aug02 reveals that this male patient began on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin. His post cycle two computed tomography scans revealed increase in size of liver lesion with development of multiple new satellite nodules. On 29Jul02, the investigator removed this patient from protocol for progressive disease and recommended hospice care. After seeking a second opinion from two other institutions, this patient was admitted to hospice on 05Aug02. On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair. On 25Aug02, this patient died due to disease progression. The investigator assessed the death as not related to study treatment and expected" The co-reference superscript is lost by then. Did you tried with the complete text above by any chance in your piper GUI? Also I guess you did not notice the question on my last post - " Sean, we have also made some code changes in MeasurementFSM.java to identify certain measurements like '20 mg/m2' which was not identified out of the box. Should we send the code changes to you so that you can consider the same to be productized ? Please advise." Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 27, 2017 5:53 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I am glad that you are feeling better. I don't understand why you aren't getting the same output as me. I just ran your example sentence with your piper with a fresh checkout and get the html below. The css follows. Copy and paste into a file and see if you see the corefs. / html, copy into file / OneLiner Output OneLiner Text processing finished on: 9 27 2017, 08:15:31 The patient started study treatment• of Thalomid• 200mg1 ( days 1 - 21 ) , and Epirubicin• , 20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment• of hepatocellular carcinoma1 . Annotation Information function iaf(txt) { var aff=txt.replace( /AFF_/g,"<br><
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, I am glad that you are feeling better. I don't understand why you aren't getting the same output as me. I just ran your example sentence with your piper with a fresh checkout and get the html below. The css follows. Copy and paste into a file and see if you see the corefs. / html, copy into file / OneLiner Output OneLiner Text processing finished on: 9 27 2017, 08:15:31 The patient started study treatment• of Thalomid• 200mg1 ( days 1 - 21 ) , and Epirubicin• , 20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment• of hepatocellular carcinoma1 . Annotation Information function iaf(txt) { var aff=txt.replace( /AFF_/g,"<br><h3>Affirmed</h3>" ); var neg=aff.replace( /NEG_/g,"<br><h3>Negated</h3>" ); var unc=neg.replace( /UNC_/g,"<br><h3>Uncertain</h3>" ); var unn=unc.replace( /UNN_/g,"<br><h3>Uncertain, Negated</h3>" ); var ant=unn.replace( /ANT/g,"<b>Anatomical Site</b>" ); var dis=ant.replace( /DIS/g,"<b>Disease/ Disorder</b>" ); var fnd=dis.replace( /FND/g,"<b>Sign/ Symptom</b>" ); var prc=fnd.replace( /PRC/g,"<b>Procedure</b>" ); var drg=prc.replace( /DRG/g,"<b>Medication</b>" ); var evt=drg.replace( /EVT/g,"<b>Event</b>" ); var tmx=evt.replace( /TMX/g,"<b>Time</b>" ); var unk=tmx.replace( /UNK/g,"<b>Unknown</b>" ); var spc=unk.replace( /SPC_/g," " ); var prf1=spc.replace( /\[/g,"<i>" ); var prf2=prf1.replace( /\]/g,"</i>" ); var nl=prf2.replace( /NL_/g,"<br>" ); document.getElementById("ia").innerHTML = nl; } function crf1() { document.getElementById("ia").innerHTML = "<br><h3>Coreference Chain</h3>study treatment of Thalomid 200mg<br>the treatment of hepatocellular carcinoma"; } / css, copy into file named ctakes.pretty.css in same directory as html / .GNR_ { position: relative; display: inline-block gray; border-bottom: 0.10em solid gray; } .AFF_ { position: relative; display: inline-block green; border-bottom: 0.15em solid green; } .UNC_ { position: relative; display: inline-block gold; border-bottom: 0.16em dotted gold; } .NEG_ { position: relative; display: inline-block red; border-bottom: 0.16em dashed red; } .UNN_ { position: relative; display: inline-block orange; border-bottom: 0.16em dashed orange; } .FND { color: magenta; } .DIS { color: black; } .DRG { color: red; } .PRC { color: blue; } .ANT { color: gray; } .UNK { color: gray; } [TIP] { position: relative; z-index: 2; cursor: pointer; } [TIP]::before, [TIP]::after { visibility: hidden; -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=0)"; filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=0); opacity: 0; pointer-events: none; } [TIP]::before { position: absolute; bottom: 0%; left: 100%; margin-bottom: 5px; padding: 7px; -webkit-border-radius: 3px; -moz-border-radius: 3px; border-radius: 3px; background-color: #000; background-color: hsla(0, 0%, 20%, 0.9); color: #fff; content: attr(TIP); text-align: center; font-size: 14px; line-height: 1.2; } [TIP]:hover::before, [TIP]:hover::after { visibility: visible; -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=100)"; filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=100); opacity: 1; } div#ia { position: fixed; top: 0; right: 0; width: 20%; height: 100%; padding: 10px; overflow: auto; background-color: lightgray; } div#content { width: 79%; height: 100%; padding: 10px; overflow: auto; } -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 27, 2017 4:40 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Sorry for the delayed response as I was out of office due to illness. If I don't add BackwardsTimeAnnotator, I don't see any error related to isTraining param. But still couldn't get the superscript co-reference working. Please note that I am using the latest 4.0.1 jars. The piper file and console log messages are as follows: PIPER FILE: // Advanced Tokenization: Regex sectionization, BIO Sentence De
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
entify certain measurements like '20 mg/m2' which was not identified out of the box. Should we send the code changes to you so that you can consider the same to be productized ? Please advise. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Friday, September 22, 2017 6:54 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, You don't need to add BackwardsTimeAnnotator to your piper. It is added by the TemporalSubPipe.piper. The error that you are seeing regarding training is very strange, but you can try adding this line to the top of the file: set isTraining=false Can you run a sample file with your piper and send me the log statements? It might help me figure out what is going on. > is there any doc or guide on how to start writing our own annotator. There are two example annotators in the ctakes-examples project under the ae/ directory. You can look at those, but I recommend that you look at some information on Uimafit, which can be used to create new annotators: https://uima.apache.org/d/uimafit-2.1.0/tools.uimafit.book.pdf An introduction to creating Analysis Engines (Annotators) is on page 5. Coding style is individualistic, but below is a rubberstamp that I use to get started: import org.apache.ctakes.core.pipeline.PipeBitInfo; import org.apache.log4j.Logger; import org.apache.uima.UimaContext; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.fit.component.JCasAnnotator_ImplBase; import org.apache.uima.jcas.JCas; import org.apache.uima.resource.ResourceInitializationException; /** * @author SPF , chip-nlp * @version %I% * @since 9/22/2017 */ @PipeBitInfo( name = "Template", description = "For Example.", role = PipeBitInfo.Role.ANNOTATOR ) final public class Template extends JCasAnnotator_ImplBase { static private final Logger LOGGER = Logger.getLogger( "Template" ); /** * {@inheritDoc} */ @Override public void initialize( final UimaContext context ) throws ResourceInitializationException { // Always call the super first super.initialize( context ); // place AE initialization code here } /** * {@inheritDoc} */ @Override public void process( final JCas jCas ) throws AnalysisEngineProcessException { LOGGER.info( "Processing ..." ); // Place AE processing code here LOGGER.info( "Finished." ); } } If you use IntelliJ as your ide you can create a file template with these parameters: #if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")package ${PACKAGE_NAME};#end import org.apache.ctakes.core.pipeline.PipeBitInfo; import org.apache.log4j.Logger; import org.apache.uima.UimaContext; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.fit.component.JCasAnnotator_ImplBase; import org.apache.uima.jcas.JCas; import org.apache.uima.resource.ResourceInitializationException; #parse("File Header.java") @PipeBitInfo( name = "${NAME}", #if ( ${PROJECT_NAME} != "")description = "For ${PROJECT_NAME}.",#end role = PipeBitInfo.Role.ANNOTATOR ) final public class ${NAME} extends JCasAnnotator_ImplBase { static private final Logger LOGGER = Logger.getLogger( "${NAME}" ); /** * {@inheritDoc} */ @Override public void initialize( final UimaContext context ) throws ResourceInitializationException { // Always call the super first super.initialize( context ); // place AE initialization code here } /** * {@inheritDoc} */ @Override public void process( final JCas jCas ) throws AnalysisEngineProcessException { LOGGER.info( "Processing ..." ); // Place AE processing code here LOGGER.info( "Finished." ); } } -Original Message----- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Friday, September 22, 2017 2:23 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks again for the detailed response. I still couldn't manage to get superscript-1 co-reference in piper GUI. Also I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the below error: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed. (Descriptor: ) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.jav
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, You don't need to add BackwardsTimeAnnotator to your piper. It is added by the TemporalSubPipe.piper. The error that you are seeing regarding training is very strange, but you can try adding this line to the top of the file: set isTraining=false Can you run a sample file with your piper and send me the log statements? It might help me figure out what is going on. > is there any doc or guide on how to start writing our own annotator. There are two example annotators in the ctakes-examples project under the ae/ directory. You can look at those, but I recommend that you look at some information on Uimafit, which can be used to create new annotators: https://uima.apache.org/d/uimafit-2.1.0/tools.uimafit.book.pdf An introduction to creating Analysis Engines (Annotators) is on page 5. Coding style is individualistic, but below is a rubberstamp that I use to get started: import org.apache.ctakes.core.pipeline.PipeBitInfo; import org.apache.log4j.Logger; import org.apache.uima.UimaContext; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.fit.component.JCasAnnotator_ImplBase; import org.apache.uima.jcas.JCas; import org.apache.uima.resource.ResourceInitializationException; /** * @author SPF , chip-nlp * @version %I% * @since 9/22/2017 */ @PipeBitInfo( name = "Template", description = "For Example.", role = PipeBitInfo.Role.ANNOTATOR ) final public class Template extends JCasAnnotator_ImplBase { static private final Logger LOGGER = Logger.getLogger( "Template" ); /** * {@inheritDoc} */ @Override public void initialize( final UimaContext context ) throws ResourceInitializationException { // Always call the super first super.initialize( context ); // place AE initialization code here } /** * {@inheritDoc} */ @Override public void process( final JCas jCas ) throws AnalysisEngineProcessException { LOGGER.info( "Processing ..." ); // Place AE processing code here LOGGER.info( "Finished." ); } } If you use IntelliJ as your ide you can create a file template with these parameters: #if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")package ${PACKAGE_NAME};#end import org.apache.ctakes.core.pipeline.PipeBitInfo; import org.apache.log4j.Logger; import org.apache.uima.UimaContext; import org.apache.uima.analysis_engine.AnalysisEngineProcessException; import org.apache.uima.fit.component.JCasAnnotator_ImplBase; import org.apache.uima.jcas.JCas; import org.apache.uima.resource.ResourceInitializationException; #parse("File Header.java") @PipeBitInfo( name = "${NAME}", #if ( ${PROJECT_NAME} != "")description = "For ${PROJECT_NAME}.",#end role = PipeBitInfo.Role.ANNOTATOR ) final public class ${NAME} extends JCasAnnotator_ImplBase { static private final Logger LOGGER = Logger.getLogger( "${NAME}" ); /** * {@inheritDoc} */ @Override public void initialize( final UimaContext context ) throws ResourceInitializationException { // Always call the super first super.initialize( context ); // place AE initialization code here } /** * {@inheritDoc} */ @Override public void process( final JCas jCas ) throws AnalysisEngineProcessException { LOGGER.info( "Processing ..." ); // Place AE processing code here LOGGER.info( "Finished." ); } } -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Friday, September 22, 2017 2:23 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Sean, Thanks again for the detailed response. I still couldn't manage to get superscript-1 co-reference in piper GUI. Also I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the below error: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed. (Descriptor: ) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170) Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING - unable to infer it from context at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109) Somewhere in old mails it's mentioned that it's because of missing dependencies so I tried adding ClearTkAnnotator with no luck yet. My piper file is as follows: load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Sean, Thanks again for the detailed response. I still couldn't manage to get superscript-1 co-reference in piper GUI. Also I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the below error: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed. (Descriptor: ) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170) Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING - unable to infer it from context at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109) Somewhere in old mails it's mentioned that it's because of missing dependencies so I tried adding ClearTkAnnotator with no luck yet. My piper file is as follows: load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger load ChunkerSubPipe.piper load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator load AttributeCleartkSubPipe.piper load RelationSubPipe.piper load TemporalSubPipe.piper load CorefSubPipe.piper add org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator add pretty.html.HtmlTextWriter add FileTreeXmiWriter Any suggestion on this? Also I'm using all the latest 4.0.1 cTAKES Jars. Regarding the identification of Names, will dig deep on what you have mentioned. Sorry to ask this as you already mentioned that there are no detailed docs for cTAKES. But is there any doc or guide on how to start writing our own annotator if required? It not, Is there any simple annotator that you would suggest us to look into to get better understanding on annotators for us to proceed further. Thanks in advance. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 21, 2017 7:59 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, > We guess we are missing out on something as we could not find co-references > for "200mg". Should we add anymore piper for this? The piper commands that I sent has everything to obtain coreferences. I use it regularly - it is what I used on your example sentence to get the coreferences that I mentioned. > Also the change mentioned in the thread ... That is a very old thread and I don't think that it applies to what you are trying to do. > We also have a requirement to identify the patient names and sex As James said, ctakes isn't really meant to do this. Ctakes is catered toward extracting clinical data, and to this point names have not fallen into that category. It is more a task for general nlp. There is an opennlp model that can identify names and a few others (I used to see names using GATE). ctakes has wrapped opennlp for other tasks and you should be able to do the same to adapt an engine for names into ctakes. > cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / > 02 or 27Aug2002 As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on gold data. It isn't perfect. You can add another time annotator on top of this to get some of the more simply formatted date mentions - there are a lot of them out there. Personally I have used jchronic as it can be easily tweaked to recognize medically-relevant temporal expressions relating to surgery, pharmacology, etc. Sean -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 8:50 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( day
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, > We guess we are missing out on something as we could not find co-references > for "200mg". Should we add anymore piper for this? The piper commands that I sent has everything to obtain coreferences. I use it regularly - it is what I used on your example sentence to get the coreferences that I mentioned. > Also the change mentioned in the thread ... That is a very old thread and I don't think that it applies to what you are trying to do. > We also have a requirement to identify the patient names and sex As James said, ctakes isn't really meant to do this. Ctakes is catered toward extracting clinical data, and to this point names have not fallen into that category. It is more a task for general nlp. There is an opennlp model that can identify names and a few others (I used to see names using GATE). ctakes has wrapped opennlp for other tasks and you should be able to do the same to adapt an engine for names into ctakes. > cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / > 02 or 27Aug2002 As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on gold data. It isn't perfect. You can add another time annotator on top of this to get some of the more simply formatted date mentions - there are a lot of them out there. Personally I have used jchronic as it can be easily tweaked to recognize medically-relevant temporal expressions relating to surgery, pharmacology, etc. Sean -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 8:50 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e= is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certa
RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
That is very informative - Thanks Chen! -Original Message- From: Lin, Chen [mailto:chen@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 3:37 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, As for the error in EventTimeRelationAnnotator, the reason is that the time-class attribute value for an temporal expression mention is missing. When we develop this annotator, we used time-class in the gold annotation as a feature to help the classifier. If this feature is missing, the system can still predict event-time relation, but the performance will drop a little. Our test on SemEval 2015 data shows if the temporal attributes are missing, the system performance will drop 0.012 in F-score (Table 4 of https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC5009920_&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=oQQvhPN8wZ_LuvLpAO3D_2-LZpC-Tv6WuPa91xNS-gw&s=JcBwFJ_L-dVY7Ncal1XDHE-7awOU7sA5_N2X1ij_ggI&e= ). If you really want this time-class feature, please add ³BackwardTimeAnnotator² into your processing pipeline, which will annotate temporal expressions and predict their time classes. Please keep in mind that this annotator is not 100% accurate either. Best, Chen On 9/20/17, 2:43 PM, "Gandhi Rajan Natarajan" wrote: >Hi James & Sean, Thanks for your support. > > > >Regarding point-1, We don¹t have any database or metadata to get the >name or sex information. Is it not possible to achieve in cTAKES by any >other names? If yes, what other approach will be feasible to implement >this along with cTAKES as we need this info very much for our requirement. > > > >Regarding point-2, I will have a check on what you have suggested. But >dates analysis is not part of temporal module? Do you mean to say that >if we use drug ner module, ContextDependentTokenizerAnnotator will be >overwritten for date identifications? Also while using piper GUI to run >the analysis, we could see the following message in the console: > >21 Sep 2017 00:08:04 INFO EventTimeRelationAnnotator - Starting >processing ... > >Null value found in Feature(, ) > > > >Could someone brief on this error and how to overcome it? > > > > > >Regards, > >Gandhi > > > > > >-Original Message- > >From: James Masanz [mailto:masanz.ja...@gmail.com] > >Sent: Wednesday, September 20, 2017 8:41 PM > >To: dev@ctakes.apache.org > >Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > > > >1) I would typically not use cTAKES for extracting patient names or sex. >is there any database or metadata that you can get that information from? > > > >2) Dates are found by the ContextDependentTokenizerAnnotator, which uses >DateFSM.java in package org.apache.ctakes.core.fsm.machine. > >I believe drug ner uses DateParser in org.apache.ctakes.core.util to >interpret the date annotations. So you might need to modify both DateFSM >and DateParser. > > > > > > > >On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < >gandhi.natara...@arisglobal.com> wrote: > > > >> Hi Sean, > >> > >> Thanks again for the detailed and prompt response. We were able to run > >> the piper GUI as per your advice. But in the output (The patient > >> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and > >> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the > >> treatment of hepatocellular carcinoma.), we were not able to find > >> superscript-1 as you mentioned earlier but could find superscript-2, 3 > >> etc. We guess we are missing out on something as we could not find > >> co-references for "200mg". Should we add anymore piper for this? > >> > >> Also the change mentioned in the thread - >>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apach >>e&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241CwYZ3Asza >>TEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd >>0s&s=xElCOx2UASgWtuWUmL3KouME2Jivc5P_7UaHxzdROBw&e= . > >> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > >> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required > >> for the drug-ner module to identify drug-ner annotations. > >> > >> 1) We also have a requirement to identify the patient names and sex > >> available in narrative texts. Please let us know how to achieve the > >> same as its not identifying the proper nouns and
Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Gandhi, As for the error in EventTimeRelationAnnotator, the reason is that the time-class attribute value for an temporal expression mention is missing. When we develop this annotator, we used time-class in the gold annotation as a feature to help the classifier. If this feature is missing, the system can still predict event-time relation, but the performance will drop a little. Our test on SemEval 2015 data shows if the temporal attributes are missing, the system performance will drop 0.012 in F-score (Table 4 of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009920/). If you really want this time-class feature, please add ³BackwardTimeAnnotator² into your processing pipeline, which will annotate temporal expressions and predict their time classes. Please keep in mind that this annotator is not 100% accurate either. Best, Chen On 9/20/17, 2:43 PM, "Gandhi Rajan Natarajan" wrote: >Hi James & Sean, Thanks for your support. > > > >Regarding point-1, We don¹t have any database or metadata to get the >name or sex information. Is it not possible to achieve in cTAKES by any >other names? If yes, what other approach will be feasible to implement >this along with cTAKES as we need this info very much for our requirement. > > > >Regarding point-2, I will have a check on what you have suggested. But >dates analysis is not part of temporal module? Do you mean to say that >if we use drug ner module, ContextDependentTokenizerAnnotator will be >overwritten for date identifications? Also while using piper GUI to run >the analysis, we could see the following message in the console: > >21 Sep 2017 00:08:04 INFO EventTimeRelationAnnotator - Starting >processing ... > >Null value found in Feature(, ) > > > >Could someone brief on this error and how to overcome it? > > > > > >Regards, > >Gandhi > > > > > >-Original Message- > >From: James Masanz [mailto:masanz.ja...@gmail.com] > >Sent: Wednesday, September 20, 2017 8:41 PM > >To: dev@ctakes.apache.org > >Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > > > >1) I would typically not use cTAKES for extracting patient names or sex. >is there any database or metadata that you can get that information from? > > > >2) Dates are found by the ContextDependentTokenizerAnnotator, which uses >DateFSM.java in package org.apache.ctakes.core.fsm.machine. > >I believe drug ner uses DateParser in org.apache.ctakes.core.util to >interpret the date annotations. So you might need to modify both DateFSM >and DateParser. > > > > > > > >On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < >gandhi.natara...@arisglobal.com> wrote: > > > >> Hi Sean, > >> > >> Thanks again for the detailed and prompt response. We were able to run > >> the piper GUI as per your advice. But in the output (The patient > >> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and > >> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the > >> treatment of hepatocellular carcinoma.), we were not able to find > >> superscript-1 as you mentioned earlier but could find superscript-2, 3 > >> etc. We guess we are missing out on something as we could not find > >> co-references for "200mg". Should we add anymore piper for this? > >> > >> Also the change mentioned in the thread - >>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apach >>e&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241CwYZ3Asza >>TEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd >>0s&s=xElCOx2UASgWtuWUmL3KouME2Jivc5P_7UaHxzdROBw&e= . > >> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > >> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required > >> for the drug-ner module to identify drug-ner annotations. > >> > >> 1) We also have a requirement to identify the patient names and sex > >> available in narrative texts. Please let us know how to achieve the > >> same as its not identifying the proper nouns and the relationship with >>the patient? > >> Eg. "This male patient named Tom Hardy aged 35 years is participating > >> in a Non-IND study" > >> > >> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or > >> 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know > >> how to enhance the system to identify such date patterns. > >> E.g " On 20Aug02, the investigator noted that this patient was > >> suffering worsening fatigue a
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi James & Sean, Thanks for your support. Regarding point-1, We don’t have any database or metadata to get the name or sex information. Is it not possible to achieve in cTAKES by any other names? If yes, what other approach will be feasible to implement this along with cTAKES as we need this info very much for our requirement. Regarding point-2, I will have a check on what you have suggested. But dates analysis is not part of temporal module? Do you mean to say that if we use drug ner module, ContextDependentTokenizerAnnotator will be overwritten for date identifications? Also while using piper GUI to run the analysis, we could see the following message in the console: 21 Sep 2017 00:08:04 INFO EventTimeRelationAnnotator - Starting processing ... Null value found in Feature(, ) Could someone brief on this error and how to overcome it? Regards, Gandhi -Original Message- From: James Masanz [mailto:masanz.ja...@gmail.com] Sent: Wednesday, September 20, 2017 8:41 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 1) I would typically not use cTAKES for extracting patient names or sex. is there any database or metadata that you can get that information from? 2) Dates are found by the ContextDependentTokenizerAnnotator, which uses DateFSM.java in package org.apache.ctakes.core.fsm.machine. I believe drug ner uses DateParser in org.apache.ctakes.core.util to interpret the date annotations. So you might need to modify both DateFSM and DateParser. On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < gandhi.natara...@arisglobal.com> wrote: > Hi Sean, > > Thanks again for the detailed and prompt response. We were able to run > the piper GUI as per your advice. But in the output (The patient > started study treatment of Thalomid 200mg ( days 1 - 21 ) , and > Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the > treatment of hepatocellular carcinoma.), we were not able to find > superscript-1 as you mentioned earlier but could find superscript-2, 3 > etc. We guess we are missing out on something as we could not find > co-references for "200mg". Should we add anymore piper for this? > > Also the change mentioned in the thread - http://mail-archives.apache. > org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required > for the drug-ner module to identify drug-ner annotations. > > 1) We also have a requirement to identify the patient names and sex > available in narrative texts. Please let us know how to achieve the > same as its not identifying the proper nouns and the relationship with the > patient? > Eg. "This male patient named Tom Hardy aged 35 years is participating > in a Non-IND study" > > 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or > 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know > how to enhance the system to identify such date patterns. > E.g " On 20Aug02, the investigator noted that this patient was > suffering worsening fatigue and got tired getting out of his chair" > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Monday, September 18, 2017 10:02 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates > [EXTERNAL] > > Hi Gandhi, > > > So in this case will be able to see drug attributes in the output XML? > As long as you have the DrugMentionAnnotator in your pipeline you > should be able to find drug attributes in the xml output file. > > > we also saw some code changes needs to be done to use drug-ner module. > Is it still valid? > As far as I know there aren't any necessary code changes to get drug > ner running. However, I do not normally use drugner so I can't say for > certain. > > > Also you mentioned that the drun-ner module is out of date > It can still be used and will produce annotations. All that I meant > was that there may not be many people out there using it. It is not > part of the default pipeline. > > > You also mentioned that when you run the sentence, the date was > identified. Where and how exactly did you ran it so that we can check > the same? > I run the following in a piper file because I am interested in a lot > of modules (I added drugner just for you): > > // Advanced Tokenization: Regex sectionization, BIO Sentence Detector > (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add > ContextDependentTokenizerAnnotator > add POSTagger > // Chunkers > load ChunkerSubPipe.piper > // Default fast dictionary lookup > load DictionarySubPipe.piper &g
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Thanks James! -Original Message- From: James Masanz [mailto:masanz.ja...@gmail.com] Sent: Wednesday, September 20, 2017 11:11 AM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 1) I would typically not use cTAKES for extracting patient names or sex. is there any database or metadata that you can get that information from? 2) Dates are found by the ContextDependentTokenizerAnnotator, which uses DateFSM.java in package org.apache.ctakes.core.fsm.machine. I believe drug ner uses DateParser in org.apache.ctakes.core.util to interpret the date annotations. So you might need to modify both DateFSM and DateParser. On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < gandhi.natara...@arisglobal.com> wrote: > Hi Sean, > > Thanks again for the detailed and prompt response. We were able to run the > piper GUI as per your advice. But in the output (The patient started study > treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( > days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular > carcinoma.), we were not able to find superscript-1 as you mentioned > earlier but could find superscript-2, 3 etc. We guess we are missing out > on something as we could not find co-references for "200mg". Should we add > anymore piper for this? > > Also the change mentioned in the thread - > https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=0yix0VPTznOGjqsXh9VcGDn5yF7xI1Y2BJFHROP03xQ&s=Af6hYWtMcMTkGE6egTQNnz8ht9vAXF5hDoANXnR5mK8&e= > . > org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for > the drug-ner module to identify drug-ner annotations. > > 1) We also have a requirement to identify the patient names and sex > available in narrative texts. Please let us know how to achieve the same as > its not identifying the proper nouns and the relationship with the patient? > Eg. "This male patient named Tom Hardy aged 35 years is participating in a > Non-IND study" > > 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 > / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to > enhance the system to identify such date patterns. > E.g " On 20Aug02, the investigator noted that this patient was suffering > worsening fatigue and got tired getting out of his chair" > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Monday, September 18, 2017 10:02 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > > Hi Gandhi, > > > So in this case will be able to see drug attributes in the output XML? > As long as you have the DrugMentionAnnotator in your pipeline you should > be able to find drug attributes in the xml output file. > > > we also saw some code changes needs to be done to use drug-ner module. > Is it still valid? > As far as I know there aren't any necessary code changes to get drug ner > running. However, I do not normally use drugner so I can't say for certain. > > > Also you mentioned that the drun-ner module is out of date > It can still be used and will produce annotations. All that I meant was > that there may not be many people out there using it. It is not part of > the default pipeline. > > > You also mentioned that when you run the sentence, the date was > identified. Where and how exactly did you ran it so that we can check the > same? > I run the following in a piper file because I am interested in a lot of > modules (I added drugner just for you): > > // Advanced Tokenization: Regex sectionization, BIO Sentence Detector > (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add > ContextDependentTokenizerAnnotator > add POSTagger > // Chunkers > load ChunkerSubPipe.piper > // Default fast dictionary lookup > load DictionarySubPipe.piper > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator > // Cleartk Entity Attributes > load AttributeCleartkSubPipe.piper > // Relations > load RelationSubPipe.piper > // Temporal > load TemporalSubPipe.piper > // Coreferences > load CorefSubPipe.piper > // Html output > add pretty.html.HtmlTextWriter > > For information on piper files, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=0yix0VPTznOGjqsXh9VcGDn5yF7xI1Y2BJFHRO
Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
1) I would typically not use cTAKES for extracting patient names or sex. is there any database or metadata that you can get that information from? 2) Dates are found by the ContextDependentTokenizerAnnotator, which uses DateFSM.java in package org.apache.ctakes.core.fsm.machine. I believe drug ner uses DateParser in org.apache.ctakes.core.util to interpret the date annotations. So you might need to modify both DateFSM and DateParser. On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < gandhi.natara...@arisglobal.com> wrote: > Hi Sean, > > Thanks again for the detailed and prompt response. We were able to run the > piper GUI as per your advice. But in the output (The patient started study > treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( > days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular > carcinoma.), we were not able to find superscript-1 as you mentioned > earlier but could find superscript-2, 3 etc. We guess we are missing out > on something as we could not find co-references for "200mg". Should we add > anymore piper for this? > > Also the change mentioned in the thread - http://mail-archives.apache. > org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for > the drug-ner module to identify drug-ner annotations. > > 1) We also have a requirement to identify the patient names and sex > available in narrative texts. Please let us know how to achieve the same as > its not identifying the proper nouns and the relationship with the patient? > Eg. "This male patient named Tom Hardy aged 35 years is participating in a > Non-IND study" > > 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 > / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to > enhance the system to identify such date patterns. > E.g " On 20Aug02, the investigator noted that this patient was suffering > worsening fatigue and got tired getting out of his chair" > > Regards, > Gandhi > > > -Original Message- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Monday, September 18, 2017 10:02 PM > To: dev@ctakes.apache.org > Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] > > Hi Gandhi, > > > So in this case will be able to see drug attributes in the output XML? > As long as you have the DrugMentionAnnotator in your pipeline you should > be able to find drug attributes in the xml output file. > > > we also saw some code changes needs to be done to use drug-ner module. > Is it still valid? > As far as I know there aren't any necessary code changes to get drug ner > running. However, I do not normally use drugner so I can't say for certain. > > > Also you mentioned that the drun-ner module is out of date > It can still be used and will produce annotations. All that I meant was > that there may not be many people out there using it. It is not part of > the default pipeline. > > > You also mentioned that when you run the sentence, the date was > identified. Where and how exactly did you ran it so that we can check the > same? > I run the following in a piper file because I am interested in a lot of > modules (I added drugner just for you): > > // Advanced Tokenization: Regex sectionization, BIO Sentence Detector > (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add > ContextDependentTokenizerAnnotator > add POSTagger > // Chunkers > load ChunkerSubPipe.piper > // Default fast dictionary lookup > load DictionarySubPipe.piper > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator > // Cleartk Entity Attributes > load AttributeCleartkSubPipe.piper > // Relations > load RelationSubPipe.piper > // Temporal > load TemporalSubPipe.piper > // Coreferences > load CorefSubPipe.piper > // Html output > add pretty.html.HtmlTextWriter > > For information on piper files, see https://cwiki.apache.org/ > confluence/display/CTAKES/Piper+Files > I run it in my IDE with: > org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p > .piper -i org/apache/ctakes/examples/notes -o > --user --pass You can run it by command line by > substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" > with "bin/runPiperFile". > You can also run it through a ctakes 4.01 (trunk) gui. See > https://cwiki.apache.org/confluence/display/CTAKES/ > Piper+File+Submitter+GUI > > > I'm not able to see any clickable option in HTML output > You must have the HtmlTextWriter at the end of your pipeline to produce > html files. To keep the xml file output, place &qu
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Gandhi, No problem. Can anybody else out there field at least some of this today? I may not get to it until tomorrow. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 9:53 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Thanks for the response Sean. Your help is really appreciated. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 6:20 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e= is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter F
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Thanks for the response Sean. Your help is really appreciated. Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 6:20 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e= is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo&e= I run it in
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e= is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo&e= I run it in my IDE with: org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper -i org/apache/ctakes/examples/notes -o --user --pass You can run it by command line by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with "bin/r
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi, Could someone help me out on the below queries please? Regards, Gandhi -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files I run it in my IDE with: org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper -i org/apache/ctakes/examples/notes -o --user --pass You can run it by command line by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with "bin/runPiperFile". You can also run it through a ctakes 4.01 (trunk) gui. See https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI > I'm not able to see any clickable option in HTML output You must have the HtmlTextWriter at the end of your pipeline to produce html files. To keep the xml file output, place "add FileTreeXmiWriter" at the end of the piper. > Apologizes for too many No worries, we are happy to have your interest! Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Saturday, September 16, 2017 7:01 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the prompt response. Appreciate your input on adding DrugMentionAnnotator. Actually, we are relying on pretty printer output just to understand the analysis. Our logic to extract disorders and findings are based on the XML file generated by https
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files I run it in my IDE with: org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper -i org/apache/ctakes/examples/notes -o --user --pass You can run it by command line by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with "bin/runPiperFile". You can also run it through a ctakes 4.01 (trunk) gui. See https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI > I'm not able to see any clickable option in HTML output You must have the HtmlTextWriter at the end of your pipeline to produce html files. To keep the xml file output, place "add FileTreeXmiWriter" at the end of the piper. > Apologizes for too many No worries, we are happy to have your interest! Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Saturday, September 16, 2017 7:01 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the prompt response. Appreciate your input on adding DrugMentionAnnotator. Actually, we are relying on pretty printer output just to understand the analysis. Our logic to extract disorders and findings are based on the XML file generated by https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files I run it in my IDE with: org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper -i org/apache/ctakes/examples/notes -o --user --pass You can run it by command line by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with "bin/runPiperFile". You can also run it through a ctakes 4.01 (trunk) gui. See https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI > I'm not able to see any clickable option in HTML output You must have the HtmlTextWriter at the end of your pipeline to produce html files. To keep the xml file output, place "add FileTreeXmiWriter" at the end of the piper. > Apologizes for too many No worries, we are happy to have your interest! Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Saturday, September 16, 2017 7:01 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the prompt response. Appreciate your input on adding DrugMentionAnnotator. Actually, we are relying on pretty printer output just to understand the analysis. Our logic to extract disorders and findings are based on the XML file generated by https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=g8UzBHRoOyn1hoRABKSC6EtPMvwOSSggviRmWCHKti4&e= So in this case will be able to see drug attributes in the output XML? In one of the old post (https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=iT_1UGR98APO80UaZsaCBHseMqF4M4PfItgokD27r5c&e= ) we also saw some code changes needs to be done to use drug-ner module. Is it still valid? Also you mentioned that the drun-ner module is out of date which means it cannot be used or it may not provide accurate analysis? Also what changes needs to be done to bring it up to date so that we can try the same if you can assist? You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? Also regarding you explanation on corefernce, I'm not able to see any clickable option in HTML output. So wanted to understand how can we run and check that too. Apologizes for too many questions as we are just a week old in NLP and cTAKES. Thanks in advance. Regards, Gandhi This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Sean, Thanks again for the prompt response. Appreciate your input on adding DrugMentionAnnotator. Actually, we are relying on pretty printer output just to understand the analysis. Our logic to extract disorders and findings are based on the XML file generated by https://github.com/healthnlp/examples/blob/master/ctakes-temporal-demo/src/main/java/org/apache/ctakes/web/client/servlet/DemoServlet.java So in this case will be able to see drug attributes in the output XML? In one of the old post (http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E ) we also saw some code changes needs to be done to use drug-ner module. Is it still valid? Also you mentioned that the drun-ner module is out of date which means it cannot be used or it may not provide accurate analysis? Also what changes needs to be done to bring it up to date so that we can try the same if you can assist? You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? Also regarding you explanation on corefernce, I'm not able to see any clickable option in HTML output. So wanted to understand how can we run and check that too. Apologizes for too many questions as we are just a week old in NLP and cTAKES. Thanks in advance. Regards, Gandhi This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.
RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
Hi Gandhi, (Hi Tim, find below the best coref chain I have ever seen), Unfortunately, it looks like the drug-ner module has not been kept up-to-date. I just checked the cpe xml files and they contain invalid pointers. Anyway, you should be able to add the DrugMentionAnnotator by using: AggregateBuilder (code): aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( DrugMentionAnnotator.class ) ); Piper file: add org.apache.ctakes.drugner.ae.DrugMentionAnnotator Unfortunately, the drug attribute types all extend the type Annotation. The PrettyTextWriter that you are using only marks IdentifiedAnnotation subtypes, so you will not see the drug attributes without writing some extra code. On that matter, I recommend that you use HtmlTextWriter for output as it provides more information in a nicer format - though still not drug ner attributes. One nice feature is the markup of coreferences. Using your example sentence: "The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin , 20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma." It marks a superscript '1' (coreference chain #1) after "200mg" and "carcinoma" because Tim's excellent coreference model connected: "study treatment of Thalomid 200mg" with "the treatment of hepatocellular carcinoma"! If you click one of the superscript "4"s it will display the coreference chain in the margin. I am still working on that writer in my spare time, so if you have suggestions please let me know. As for the missing times, I don't know what you are witnessing. When I run your sentence I get the times: "days" "days 1,8" "06/07/02"(contains treatment) The "days" aren't perfect, but the "06/07/02" date and its "contains treatment" relation are pretty good. Sean -Original Message- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Friday, September 15, 2017 12:40 PM To: dev@ctakes.apache.org Subject: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi All, We are using the pipeline code as mentioned in https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_Pipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=yzJCkloh5MR6n2JJ5haAmB4_MQed5JDZnn01SFotO9c&s=CZBlVpS2hKfCLyBRrR_D4KKCAtF2ru6qf6HHtV7HnCs&e= for the cTAKES web application we are building. But in our case, the measurements and quantities are identified as events as shown below: SENTENCE: The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg /m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. DTNN VBDNN NN INNN NNS NNS CC NNP NNS NNS NNSCC IN IN DT NN IN JJ NN |=| |===||==| |===| || |===| |===| EventProcedure Drug Event DrugProcedure Disorder C0087111 C0723668 C0014582 C0087111 C0007097 |==| Disorder C2239176 >From googling what we have found out is that we need to use >DrugMentionAnnotator to identify measurements and quantities. Are we right? If >so, how do we enable DrugMentionAnnotator in our code. Could someone provide a >sample code snippet and help us out on this? Also the dates are not getting identified in our case as we get the following error in our console even after using latest temporal resources (model.jar) as per Sean's suggestion : "Null value found in Feature(, ) from [Feature(, ), Feature(, )" Could someone throw some light on this as well? Thanks in advance. Regards, Gandhi This email and any files transmitted with it are confidential and
Enabling drugner pipeline and identifying dates
Hi All, We are using the pipeline code as mentioned in https://github.com/healthnlp/examples/blob/master/ctakes-temporal-demo/src/main/java/org/apache/ctakes/web/client/servlet/Pipeline.java for the cTAKES web application we are building. But in our case, the measurements and quantities are identified as events as shown below: SENTENCE: The patient started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg /m2 (days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. DTNN VBDNN NN INNN NNS NNS CC NNP NNS NNS NNSCC IN IN DT NN IN JJ NN |=| |===||==| |===| || |===| |===| EventProcedure Drug Event DrugProcedure Disorder C0087111 C0723668 C0014582 C0087111 C0007097 |==| Disorder C2239176 >From googling what we have found out is that we need to use >DrugMentionAnnotator to identify measurements and quantities. Are we right? If >so, how do we enable DrugMentionAnnotator in our code. Could someone provide a >sample code snippet and help us out on this? Also the dates are not getting identified in our case as we get the following error in our console even after using latest temporal resources (model.jar) as per Sean's suggestion : "Null value found in Feature(, ) from [Feature(, ), Feature(, )" Could someone throw some light on this as well? Thanks in advance. Regards, Gandhi This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.