RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-04 Thread Gandhi Rajan Natarajan
Hi James,

Thanks for the response. As you said its definitely not a showstopper. We 
encountered this measurement in the narratives we were testing and thought of 
fixing this. That’s the whole idea. Also as per the code, 'fslashCondition' 
added before 2nd token should avoid false positives is what I feel. Anyways I 
will let the experts like you to decide on this. Thanks for the consideration 
again.

Regards,
Gandhi


-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com]
Sent: Tuesday, October 03, 2017 10:05 PM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

FWIW, I started taking a look at the patch. (It's in code that I'm not that
familiar with, so a quick glance isn't sufficient for me.)
I did a search in UMLS for m2 in the terminologies commonly used by cTAKES
to see if adding m2 could result in marking something as a measurement when
it's not - and I did find many terms in the UMLS that contain m2. There are
plenty of other measurement abbreviations that also appear within other
terms, so it's not a showstopper - but is a consideration.

I haven't tested the patch yet to see if the way the patch is implemented -
checking for 2 tokens - avoids that issue.  Not sure if I'll get a chance
to look more this week. if you end up picking up looking at it Sean, at
least you know what I've done.

-- James


On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Ctakes is a purely volunteer effort, so there are never any guarantees ...
> If nobody looks at the value and unit jira and patch this week then I will
> try to get to it asap.
>
> Thanks for letting us use your example note!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Tuesday, October 03, 2017 12:21 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
> Hi Sean,
>
>
>
> Will this JIRA issue - https://urldefense.proofpoint.
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_
> Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=   be
> looked up by someone as Tim mentioned?
>
>
>
> The paragraph we sent earlier can be in the example notes provided the
> protocol number is masked/modified.
>
>
>
> Regards,
>
> Gandhi
>
>
>
>
>
> -Original Message-
>
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
>
> Sent: Tuesday, October 03, 2017 7:27 PM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Gandhi,
>
>
>
> Thank you for asking.  There is no action item for you concerning the
> coreference output that you see.   However, if you would like to help the
> community understand how the module works (input and output), maybe you
> could do something like run the pipeline on your original sentence, then
> that sentence plus another (before), then that sentence plus another
> (after) ... and see how the output changes with the input.  If you take
> screenshots or something then we could put them on the wiki.  Also, would
> you mind if the paragraph you sent became one of the example notes in
> ctakes?  That means that it would be redistributed with the code.
>
>
>
> Sean
>
>
>
> -----Original Message-
>
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>
> Sent: Tuesday, October 03, 2017 4:26 AM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Tim/Sean,
>
>
>
>
>
>
>
> Is this an action item on us? If yes, Could someone give us some valid
> inputs to test the same? Is someone else going to review this again?
>
>
>
>
>
>
>
> Regards,
>
>
>
> Gandhi
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
>
>
>
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>
>
>
> Sent: Monday, October 02, 2017 8:06 PM
>
>
>
> To: dev@ctakes.apache.org
>
>
>
> Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
>
>
>
>
> My bad, I didn't read too closely and thought this was going to be a
> coreference patch. I don't know this FSM code that well, so I am not an
> expert. My biggest concern at a glance is that these addition

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-04 Thread Gandhi Rajan Natarajan
Hi Sean, Completely agree with you on this. Thanks for your support.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Tuesday, October 03, 2017 9:56 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

Ctakes is a purely volunteer effort, so there are never any guarantees ...
If nobody looks at the value and unit jira and patch this week then I will try 
to get to it asap.

Thanks for letting us use your example note!

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, October 03, 2017 12:21 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,



Will this JIRA issue - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=
   be looked up by someone as Tim mentioned?



The paragraph we sent earlier can be in the example notes provided the protocol 
number is masked/modified.



Regards,

Gandhi





-Original Message-

From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

Sent: Tuesday, October 03, 2017 7:27 PM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Gandhi,



Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.



Sean



-Original Message-

From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]

Sent: Tuesday, October 03, 2017 4:26 AM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Tim/Sean,







Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?







Regards,



Gandhi











-Original Message-



From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]



Sent: Monday, October 02, 2017 8:06 PM



To: dev@ctakes.apache.org



Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]







My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?



Tim











On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:



> Hi Sean,



>



> Thanks again for the response. I guess its mistake from my side that I



> dint send the complete text. Did you mean that with the text I sent,



> the co-reference superscript-1 will be lost?



>



> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=



> se.proofpoint.com/v2/url?u=https-



> 3A__issues.apache.org_jira_browse_CTAKES-



> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-



> IbsIg9Q1TPOylpP9FE4GTK-



> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g



> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for



> measurement FSM changes and attached the modified file changes. Could



> someone have a look and know your thoughts please?



>



> Regards,



> Gandhi



>



>



> -Original Message-



> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]



> Sent: Thursday, September 28, 2017 8:21 PM



> To: dev@ctakes.apache.org



> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>



> Subject: RE: Enabling drugner pipeline and identifying dates



> [EXTERNAL] [SUSPICIOUS]



>



> Hi Gandhi,



>



> I don't recall you sending me that entire snippet of text.  I think



> that I only had your single example sentence.



> You have discovered

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-04 Thread Gandhi Rajan Natarajan
Thanks for the update Sean. Please keep us posted so that we can test the same 
once your fix is ready.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Tuesday, October 03, 2017 10:04 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,
I have one discovery pertaining to the coref items so far.
Your first coreference (#1) is not appearing in the html because it consists 
only of a "generic" item: "this patient".
Coreference: This patient , This patient , This patient , this patient , this 
patient , this patient , this patient
This is a bug in the html writer that I will need to fix.
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes and enhancements.  This is a

> widely used process and is fairly easy to follow.   I think that the

> 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Excellent, thanks

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Tuesday, October 03, 2017 12:35 PM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

FWIW, I started taking a look at the patch. (It's in code that I'm not that
familiar with, so a quick glance isn't sufficient for me.)
I did a search in UMLS for m2 in the terminologies commonly used by cTAKES
to see if adding m2 could result in marking something as a measurement when
it's not - and I did find many terms in the UMLS that contain m2. There are
plenty of other measurement abbreviations that also appear within other
terms, so it's not a showstopper - but is a consideration.

I haven't tested the patch yet to see if the way the patch is implemented -
checking for 2 tokens - avoids that issue.  Not sure if I'll get a chance
to look more this week. if you end up picking up looking at it Sean, at
least you know what I've done.

-- James


On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Ctakes is a purely volunteer effort, so there are never any guarantees ...
> If nobody looks at the value and unit jira and patch this week then I will
> try to get to it asap.
>
> Thanks for letting us use your example note!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Tuesday, October 03, 2017 12:21 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
> Hi Sean,
>
>
>
> Will this JIRA issue - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=g0Z49i4_khuoIF0p79Jh8zvJezinR7Dq_t3WlP_e2v4=nT_lkeizLaakNLeV829Pl1rOGdbGrldsns0j2o2MNOQ=
>  .
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_
> Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=   be
> looked up by someone as Tim mentioned?
>
>
>
> The paragraph we sent earlier can be in the example notes provided the
> protocol number is masked/modified.
>
>
>
> Regards,
>
> Gandhi
>
>
>
>
>
> -Original Message-
>
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
>
> Sent: Tuesday, October 03, 2017 7:27 PM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Gandhi,
>
>
>
> Thank you for asking.  There is no action item for you concerning the
> coreference output that you see.   However, if you would like to help the
> community understand how the module works (input and output), maybe you
> could do something like run the pipeline on your original sentence, then
> that sentence plus another (before), then that sentence plus another
> (after) ... and see how the output changes with the input.  If you take
> screenshots or something then we could put them on the wiki.  Also, would
> you mind if the paragraph you sent became one of the example notes in
> ctakes?  That means that it would be redistributed with the code.
>
>
>
> Sean
>
>
>
> -Original Message-
>
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>
> Sent: Tuesday, October 03, 2017 4:26 AM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Tim/Sean,
>
>
>
>
>
>
>
> Is this an action item on us? If yes, Could someone give us some valid
> inputs to test the same? Is someone else going to review this again?
>
>
>
>
>
>
>
> Regards,
>
>
>
> Gandhi
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
>
>
>
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>
>
>
> Sent: Monday, October 02, 2017 8:06 PM
>
>
>
> To: dev@ctakes.apache.org
>
>
>
> Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
>
>
>
>
> My bad, I didn't read too closely and thought this was going to be a
> coreference patch. I don't know this FSM code that well, so I am not an
> expert. My biggest concern at a glance is that these additions help find
> more true positives (as in your examples), can we verify that they won't
> create false positives?
>
>
>
> 

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread James Masanz
FWIW, I started taking a look at the patch. (It's in code that I'm not that
familiar with, so a quick glance isn't sufficient for me.)
I did a search in UMLS for m2 in the terminologies commonly used by cTAKES
to see if adding m2 could result in marking something as a measurement when
it's not - and I did find many terms in the UMLS that contain m2. There are
plenty of other measurement abbreviations that also appear within other
terms, so it's not a showstopper - but is a consideration.

I haven't tested the patch yet to see if the way the patch is implemented -
checking for 2 tokens - avoids that issue.  Not sure if I'll get a chance
to look more this week. if you end up picking up looking at it Sean, at
least you know what I've done.

-- James


On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Ctakes is a purely volunteer effort, so there are never any guarantees ...
> If nobody looks at the value and unit jira and patch this week then I will
> try to get to it asap.
>
> Thanks for letting us use your example note!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Tuesday, October 03, 2017 12:21 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
> Hi Sean,
>
>
>
> Will this JIRA issue - https://urldefense.proofpoint.
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_
> Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=   be
> looked up by someone as Tim mentioned?
>
>
>
> The paragraph we sent earlier can be in the example notes provided the
> protocol number is masked/modified.
>
>
>
> Regards,
>
> Gandhi
>
>
>
>
>
> -Original Message-
>
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
>
> Sent: Tuesday, October 03, 2017 7:27 PM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Gandhi,
>
>
>
> Thank you for asking.  There is no action item for you concerning the
> coreference output that you see.   However, if you would like to help the
> community understand how the module works (input and output), maybe you
> could do something like run the pipeline on your original sentence, then
> that sentence plus another (before), then that sentence plus another
> (after) ... and see how the output changes with the input.  If you take
> screenshots or something then we could put them on the wiki.  Also, would
> you mind if the paragraph you sent became one of the example notes in
> ctakes?  That means that it would be redistributed with the code.
>
>
>
> Sean
>
>
>
> -----Original Message-
>
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>
> Sent: Tuesday, October 03, 2017 4:26 AM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Tim/Sean,
>
>
>
>
>
>
>
> Is this an action item on us? If yes, Could someone give us some valid
> inputs to test the same? Is someone else going to review this again?
>
>
>
>
>
>
>
> Regards,
>
>
>
> Gandhi
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
>
>
>
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>
>
>
> Sent: Monday, October 02, 2017 8:06 PM
>
>
>
> To: dev@ctakes.apache.org
>
>
>
> Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
>
>
>
>
> My bad, I didn't read too closely and thought this was going to be a
> coreference patch. I don't know this FSM code that well, so I am not an
> expert. My biggest concern at a glance is that these additions help find
> more true positives (as in your examples), can we verify that they won't
> create false positives?
>
>
>
> Tim
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
>
>
>
> > Hi Sean,
>
>
>
> >
>
>
>
> > Thanks again for the response. I guess its mistake from my side that I
>
>
>
> > dint send the complete text. Did you mean that with the text I sent,
>
>
>
> > the co-reference superscript-1 will be lost?
>
>
>
> >
>
>
>
> > Al

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi, 
I have one discovery pertaining to the coref items so far.
Your first coreference (#1) is not appearing in the html because it consists 
only of a "generic" item: "this patient".
Coreference: This patient , This patient , This patient , this patient , this 
patient , this patient , this patient
This is a bug in the html writer that I will need to fix.
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=
>  

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes and enhancements.  This is a

> widely used process and is fairly easy to follow.   I think that the

> following links are relevant:

> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http

> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-

> 2Dissues-

> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe

> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqC

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi,

Ctakes is a purely volunteer effort, so there are never any guarantees ...
If nobody looks at the value and unit jira and patch this week then I will try 
to get to it asap.

Thanks for letting us use your example note!

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 12:21 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,



Will this JIRA issue - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=
   be looked up by someone as Tim mentioned?



The paragraph we sent earlier can be in the example notes provided the protocol 
number is masked/modified.



Regards,

Gandhi





-Original Message-

From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

Sent: Tuesday, October 03, 2017 7:27 PM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Gandhi,



Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.



Sean



-Original Message-

From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]

Sent: Tuesday, October 03, 2017 4:26 AM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Tim/Sean,







Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?







Regards,



Gandhi











-Original Message-



From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]



Sent: Monday, October 02, 2017 8:06 PM



To: dev@ctakes.apache.org



Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]







My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?



Tim











On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:



> Hi Sean,



>



> Thanks again for the response. I guess its mistake from my side that I



> dint send the complete text. Did you mean that with the text I sent,



> the co-reference superscript-1 will be lost?



>



> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=



> se.proofpoint.com/v2/url?u=https-



> 3A__issues.apache.org_jira_browse_CTAKES-



> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-



> IbsIg9Q1TPOylpP9FE4GTK-



> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g



> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for



> measurement FSM changes and attached the modified file changes. Could



> someone have a look and know your thoughts please?



>



> Regards,



> Gandhi



>



>



> -Original Message-



> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]



> Sent: Thursday, September 28, 2017 8:21 PM



> To: dev@ctakes.apache.org



> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>



> Subject: RE: Enabling drugner pipeline and identifying dates



> [EXTERNAL] [SUSPICIOUS]



>



> Hi Gandhi,



>



> I don't recall you sending me that entire snippet of text.  I think



> that I only had your single example sentence.



> You have discovered one of the quirks of software: "change the data,



> change the result."



> Ctakes is a system with many moving parts.  Things that precede or



> follow your original example sentence will change the evaluation of



> that sentence.



> With the pipeline you are using and the full note, you s

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Gandhi Rajan Natarajan
Hi Sean,

Will this JIRA issue - https://issues.apache.org/jira/browse/CTAKES-459  be 
looked up by someone as Tim mentioned?

The paragraph we sent earlier can be in the example notes provided the protocol 
number is masked/modified.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Tuesday, October 03, 2017 7:27 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can con

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Thanks Tim!  I was looking for that one but couldn't find it.

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, October 03, 2017 10:03 AM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

Here's the most recent publication, which describes the system in

ctakes 4.0 and later:

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.com_science_article_pii_S1532046417300850=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=L05lBYR93doAn-IsnZW2HMb7Ev0Y_82_0CpE3FYzpEA=GohiPyZbSEWfBjnOtC6x3UNnzv-fOBTnPFaIBUnVjm8=
 

Tim



On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:

> > 

> > With the changes in Input, the co-reference between all the

> > entities should still be preserved right?

> No.  One of the experts can better explain this, but the coreference

> module works with "best match" chains.  With one sentence of text,

> term (Markable) A may have a best match with term B.  As soon as you

> add more text, you introduce the possibility that term A will have a

> better best match with C and/or D, and the previous match to B will

> be deemed less accurate and dropped.  

> In your case the coreference A - B seems to be lost in favor of one

> using internal term A', and that is a little strange.  It could be

> that overlapping markables are being discarded?  I will try to look

> into this really quickly.

> 

> You can look at some publications on coref if you search the

> web.  The one that probably best applies to the current coref module

> (Tim, Dima, is this true?) is

> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a

> nthology_W12-

> 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl

> HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik= 

> 

> Sean

> 

> -Original Message-

> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]

>  

> Sent: Tuesday, October 03, 2017 4:18 AM

> To: dev@ctakes.apache.org

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

> 

> Hi Sean, I still have some doubts on this. If I run the piper file

> with the complete text I sent earlier, I could see only superscript -

> 4 for Thalomid and the co-reference of this to  "treatment of

> hepatocellular carcinoma" is still lost. Also I don’t see any

> superscript with number-1 too. With the changes in Input, the co-

> reference between all the entities should still be preserved right?

> Do we have any more info or doc on this co-reference module to

> understand its complexity better?

> 

> Regards,

> Gandhi

> 

> 

> -----Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Monday, October 02, 2017 8:36 PM

> To: dev@ctakes.apache.org

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

> 

> Hi Tim,

> 

> The coreference question (just a question) was for a different item

> altogether.  Sorry for any confusion.  The reason that I CC:d you ...

> 

> From Gandhi:

> > 

> > Interestingly even I was able to generate [Sean's coref output]

> > using  piper GUI by  having only that single line - " The patient

> > started study treatment of Thalomid 200mg (days 1-21), and

> > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the

> > treatment of hepatocellular carcinoma. " in the input file.

> > But when I change the input file content with the following

> > lines:   [Full paragraph (below), single-sentence in middle]  The

> > co-reference superscript is lost by then.

> Sean's answer:

> > 

> > Ctakes is a system with many moving parts.  Things that precede or

> > follow your original example sentence will change the evaluation of

> > that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> > 

> > Tim can correct me here, but maybe the coreference module ranked

> > the links between "thalomid" as much higher than the rank between

> > "study treatment of thalomid 200mg" and "the treatment of

> > hepatocellular carcinoma" and discarded the encapsulating treatment

> >

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Alexandru Zbarcea
This is very informative. Thank you Tim

Alex

On Oct 3, 2017 10:06, "Miller, Timothy" <
timothy.mil...@childrens.harvard.edu> wrote:

> Here's the most recent publication, which describes the system in
> ctakes 4.0 and later:
> http://www.sciencedirect.com/science/article/pii/S1532046417300850
> Tim
>
> On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:
> > >
> > > With the changes in Input, the co-reference between all the
> > > entities should still be preserved right?
> > No.  One of the experts can better explain this, but the coreference
> > module works with "best match" chains.  With one sentence of text,
> > term (Markable) A may have a best match with term B.  As soon as you
> > add more text, you introduce the possibility that term A will have a
> > better best match with C and/or D, and the previous match to B will
> > be deemed less accurate and dropped.
> > In your case the coreference A - B seems to be lost in favor of one
> > using internal term A', and that is a little strange.  It could be
> > that overlapping markables are being discarded?  I will try to look
> > into this really quickly.
> >
> > You can look at some publications on coref if you search the
> > web.  The one that probably best applies to the current coref module
> > (Tim, Dima, is this true?) is
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a
> > nthology_W12-
> > 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl
> > HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik=
> >
> > Sean
> >
> > -Original Message-----
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> >
> > Sent: Tuesday, October 03, 2017 4:18 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Enabling drugner pipeline and identifying dates
> > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> >
> > Hi Sean, I still have some doubts on this. If I run the piper file
> > with the complete text I sent earlier, I could see only superscript -
> > 4 for Thalomid and the co-reference of this to  "treatment of
> > hepatocellular carcinoma" is still lost. Also I don’t see any
> > superscript with number-1 too. With the changes in Input, the co-
> > reference between all the entities should still be preserved right?
> > Do we have any more info or doc on this co-reference module to
> > understand its complexity better?
> >
> > Regards,
> > Gandhi
> >
> >
> > -Original Message-
> > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> > Sent: Monday, October 02, 2017 8:36 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: Enabling drugner pipeline and identifying dates
> > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> >
> > Hi Tim,
> >
> > The coreference question (just a question) was for a different item
> > altogether.  Sorry for any confusion.  The reason that I CC:d you ...
> >
> > From Gandhi:
> > >
> > > Interestingly even I was able to generate [Sean's coref output]
> > > using  piper GUI by  having only that single line - " The patient
> > > started study treatment of Thalomid 200mg (days 1-21), and
> > > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> > > treatment of hepatocellular carcinoma. " in the input file.
> > > But when I change the input file content with the following
> > > lines:   [Full paragraph (below), single-sentence in middle]  The
> > > co-reference superscript is lost by then.
> > Sean's answer:
> > >
> > > Ctakes is a system with many moving parts.  Things that precede or
> > > follow your original example sentence will change the evaluation of
> > > that sentence.
> > With the pipeline you are using and the full note, you should see a
> > number (mine is 4) next to the first "thalomid" in the original
> > example sentence.  If you click that number you should see (to the
> > right) 4 instances of "thalomid".
> > >
> > > Tim can correct me here, but maybe the coreference module ranked
> > > the links between "thalomid" as much higher than the rank between
> > > "study treatment of thalomid 200mg" and "the treatment of
> > > hepatocellular carcinoma" and discarded the encapsulating treatment
> > > texts from markables?  It is probably more complex than that.
> > Sean

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Miller, Timothy
Here's the most recent publication, which describes the system in
ctakes 4.0 and later:
http://www.sciencedirect.com/science/article/pii/S1532046417300850
Tim

On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:
> > 
> > With the changes in Input, the co-reference between all the
> > entities should still be preserved right?
> No.  One of the experts can better explain this, but the coreference
> module works with "best match" chains.  With one sentence of text,
> term (Markable) A may have a best match with term B.  As soon as you
> add more text, you introduce the possibility that term A will have a
> better best match with C and/or D, and the previous match to B will
> be deemed less accurate and dropped.  
> In your case the coreference A - B seems to be lost in favor of one
> using internal term A', and that is a little strange.  It could be
> that overlapping markables are being discarded?  I will try to look
> into this really quickly.
> 
> You can look at some publications on coref if you search the
> web.  The one that probably best applies to the current coref module
> (Tim, Dima, is this true?) is
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a
> nthology_W12-
> 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl
> HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik= 
> 
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Tuesday, October 03, 2017 4:18 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> Hi Sean, I still have some doubts on this. If I run the piper file
> with the complete text I sent earlier, I could see only superscript -
> 4 for Thalomid and the co-reference of this to  "treatment of
> hepatocellular carcinoma" is still lost. Also I don’t see any
> superscript with number-1 too. With the changes in Input, the co-
> reference between all the entities should still be preserved right?
> Do we have any more info or doc on this co-reference module to
> understand its complexity better?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, October 02, 2017 8:36 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> Hi Tim,
> 
> The coreference question (just a question) was for a different item
> altogether.  Sorry for any confusion.  The reason that I CC:d you ...
> 
> From Gandhi:
> > 
> > Interestingly even I was able to generate [Sean's coref output]
> > using  piper GUI by  having only that single line - " The patient
> > started study treatment of Thalomid 200mg (days 1-21), and
> > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> > treatment of hepatocellular carcinoma. " in the input file.
> > But when I change the input file content with the following
> > lines:   [Full paragraph (below), single-sentence in middle]  The
> > co-reference superscript is lost by then.
> Sean's answer:
> > 
> > Ctakes is a system with many moving parts.  Things that precede or
> > follow your original example sentence will change the evaluation of
> > that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> > 
> > Tim can correct me here, but maybe the coreference module ranked
> > the links between "thalomid" as much higher than the rank between
> > "study treatment of thalomid 200mg" and "the treatment of
> > hepatocellular carcinoma" and discarded the encapsulating treatment
> > texts from markables?  It is probably more complex than that.
> Sean
> 
> "This patient is participating in a Non-IND study; Protocol CG-
> 000424: "Phase I/II of Thalidomide and Epirubicin in Patients with
> Unresectable or Metastatic Hepatocellular Carcinoma".Information has
> been received from the investigator regarding an 82 year-old male
> patient who had gastrointestinal bleeding while on Thalomid,
> Epirubicin, and Coumadin. He had a past medical history of
> diverticulosis in 03/02 and a right atrial clot from intraventricular
> catheter (IVC) for which he was started on Coumadin. During the
> hospita

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi,

Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=
>  

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes and enhancements.  This is a

> widely used process and is fairly easy to follow.   I think that the

> following links are relevant:

> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http

&g

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
> With the changes in Input, the co-reference between all the entities should 
> still be preserved right?
No.  One of the experts can better explain this, but the coreference module 
works with "best match" chains.  With one sentence of text, term (Markable) A 
may have a best match with term B.  As soon as you add more text, you introduce 
the possibility that term A will have a better best match with C and/or D, and 
the previous match to B will be deemed less accurate and dropped.  
In your case the coreference A - B seems to be lost in favor of one using 
internal term A', and that is a little strange.  It could be that overlapping 
markables are being discarded?  I will try to look into this really quickly.

You can look at some publications on coref if you search the web.  The one that 
probably best applies to the current coref module (Tim, Dima, is this true?) is
https://www.aclweb.org/anthology/W12-2409

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:18 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Sean, I still have some doubts on this. If I run the piper file with the 
complete text I sent earlier, I could see only superscript - 4 for Thalomid and 
the co-reference of this to  "treatment of hepatocellular carcinoma" is still 
lost. Also I don’t see any superscript with number-1 too. With the changes in 
Input, the co-reference between all the entities should still be preserved 
right? Do we have any more info or doc on this co-reference module to 
understand its complexity better?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, October 02, 2017 8:36 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Tim,

The coreference question (just a question) was for a different item altogether. 
 Sorry for any confusion.  The reason that I CC:d you ...

From Gandhi:
> Interestingly even I was able to generate [Sean's coref output] using  piper 
> GUI by  having only that single line - " The patient started study treatment 
> of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) 
> on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input 
> file.
>But when I change the input file content with the following lines:   [Full 
>paragraph (below), single-sentence in middle]  The co-reference superscript is 
>lost by then.

Sean's answer:
> Ctakes is a system with many moving parts.  Things that precede or follow 
> your original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
>Tim can correct me here, but maybe the coreference module ranked the links 
>between "thalomid" as much higher than the rank between "study treatment of 
>thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
>the encapsulating treatment texts from markables?  It is probably more complex 
>than that.

Sean

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and l

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Gandhi Rajan Natarajan
Hi Sean, I still have some doubts on this. If I run the piper file with the 
complete text I sent earlier, I could see only superscript - 4 for Thalomid and 
the co-reference of this to  "treatment of hepatocellular carcinoma" is still 
lost. Also I don’t see any superscript with number-1 too. With the changes in 
Input, the co-reference between all the entities should still be preserved 
right? Do we have any more info or doc on this co-reference module to 
understand its complexity better?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, October 02, 2017 8:36 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Tim,

The coreference question (just a question) was for a different item altogether. 
 Sorry for any confusion.  The reason that I CC:d you ...

From Gandhi:
> Interestingly even I was able to generate [Sean's coref output] using  piper 
> GUI by  having only that single line - " The patient started study treatment 
> of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) 
> on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input 
> file.
>But when I change the input file content with the following lines:   [Full 
>paragraph (below), single-sentence in middle]  The co-reference superscript is 
>lost by then.

Sean's answer:
> Ctakes is a system with many moving parts.  Things that precede or follow 
> your original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
>Tim can correct me here, but maybe the coreference module ranked the links 
>between "thalomid" as much higher than the rank between "study treatment of 
>thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
>the encapsulating treatment texts from markables?  It is probably more complex 
>than that.

Sean

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for deep vein 
thrombosis. This patient did not take Thalomid on the day of his admittance to 
the hospital, but resumed treatment shortly after with no return of symptoms. 
On 07/15/02, he was discharged in stable condition. There have been no further 
reports of bleeding at this time. Thedoctor has assessed the hematochezia as 
related to Coumadin treatment and previously diagnosed diverticulosis, and not 
to protocol therapy with Thalomid and Epirubicin.Additional information 
received from the investigator on 27Aug02 reveals that this male patient began 
on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin.  His post 
cycle two computed tomography scans revealed increase in size of liver lesion 
with development of multiple new satellite nodules.  On 29Jul02, the 
investigator removed this patient from protocol for progressive disease and 
recommended hospice care.  After seeking a second opinion from two other 
institutions, this patient was admitted to hospice on 05Aug02.  On 20Aug02, the 
investigator noted that this patient w

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-10-02 Thread Finan, Sean
Hi Tim,

The coreference question (just a question) was for a different item altogether. 
 Sorry for any confusion.  The reason that I CC:d you ...

From Gandhi:
> Interestingly even I was able to generate [Sean's coref output] using  piper 
> GUI by  having only that single line - " The patient started study treatment 
> of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) 
> on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input 
> file.
>But when I change the input file content with the following lines:   [Full 
>paragraph (below), single-sentence in middle]  The co-reference superscript is 
>lost by then.

Sean's answer:
> Ctakes is a system with many moving parts.  Things that precede or follow 
> your original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
>Tim can correct me here, but maybe the coreference module ranked the links 
>between "thalomid" as much higher than the rank between "study treatment of 
>thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
>the encapsulating treatment texts from markables?  It is probably more complex 
>than that.

Sean

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for deep vein 
thrombosis. This patient did not take Thalomid on the day of his admittance to 
the hospital, but resumed treatment shortly after with no return of symptoms. 
On 07/15/02, he was discharged in stable condition. There have been no further 
reports of bleeding at this time. Thedoctor has assessed the hematochezia as 
related to Coumadin treatment and previously diagnosed diverticulosis, and not 
to protocol therapy with Thalomid and Epirubicin.Additional information 
received from the investigator on 27Aug02 reveals that this male patient began 
on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin.  His post 
cycle two computed tomography scans revealed increase in size of liver lesion 
with development of multiple new satellite nodules.  On 29Jul02, the 
investigator removed this patient from protocol for progressive disease and 
recommended hospice care.  After seeking a second opinion from two other 
institutions, this patient was admitted to hospice on 05Aug02.  On 20Aug02, the 
investigator noted that this patient was suffering worsening fatigue and got 
tired getting out of his chair.  On 25Aug02, this patient died due to disease 
progression.  The investigator assessed the death as not related to study 
treatment and expected"




-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, October 02, 2017 10:36 AM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

My bad, I didn't read too closely and thought this was going to be a

coreference patch. I don't know this FSM code that well, so I am not an

expert. My biggest concern at a glance is that these additions help

find more true positives (as in your examples), can we verify that they

won't creat

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-02 Thread Miller, Timothy
My bad, I didn't read too closely and thought this was going to be a
coreference patch. I don't know this FSM code that well, so I am not an
expert. My biggest concern at a glance is that these additions help
find more true positives (as in your examples), can we verify that they
won't create false positives?
Tim


On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=eO_HZCkkeOg8jF3CMYnMxttXRHSM16qdwPl5nTW48zQ= 
> 
> I don't know if you have a jira account and permissions for the
> ctakes project.  An administrator may need to set that up for you.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Thursday, September 28, 2017 4:09 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-09-29 Thread Miller, Timothy
It is a very busy time for me but this is on my todo list. Don't be
afraid to ping in a week or so if you don't hear anything.

Tim

On Fri, 2017-09-29 at 14:04 +, Finan, Sean wrote:
> Hi Gandhi,
> > 
> > Did you mean that with the text I sent, the co-reference
> > superscript-1 will be lost?
> Yes.  Well, to be more clear, the coreference that was resolved as #1
> in your original sentence alone will be lost.  However, there are
> eight or none coreference chains discovered in your full paragraph,
> and one of those will have superscript 1s.
> 
> > 
> > Could someone have a look and know your thoughts please?
> Thank you for creating the jira and the patch.  I am sure that
> somebody will take a look.
> 
> Thanks,
> Sean
> 
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Friday, September 29, 2017 2:25 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67Gv
> lGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVho
> hpozRybEEpwNUg=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp
> 2Q= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1G
> yc= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-29 Thread Finan, Sean
Hi Gandhi,
> Did you mean that with the text I sent, the co-reference superscript-1 will 
> be lost?
Yes.  Well, to be more clear, the coreference that was resolved as #1 in your 
original sentence alone will be lost.  However, there are eight or none 
coreference chains discovered in your full paragraph, and one of those will 
have superscript 1s.

> Could someone have a look and know your thoughts please?
Thank you for creating the jira and the patch.  I am sure that somebody will 
take a look.

Thanks,
Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Friday, September 29, 2017 2:25 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks again for the response. I guess its mistake from my side that I dint 
send the complete text. Did you mean that with the text I sent, the 
co-reference superscript-1 will be lost?

Also as per your advice, We have created an issue  - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q=
   for measurement FSM changes and attached the modified file changes. Could 
someone have a look and know your thoughts please?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 28, 2017 8:21 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-2Dissues-2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp2Q=
 
Creating patches:   
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-2Dpre-2Dcommit-2Dreviews-2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1Gyc=
 
Attaching files:   
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-2Dscreenshots-2Dto-2Dissues-2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=WT5NtwXSeAbZOb6iAojfglU5OKMnCTmyyo1HUUggCrE=
 

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Thursday, September 28, 2017 4:09 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-29 Thread Gandhi Rajan Natarajan
Hi Sean,

Thanks again for the response. I guess its mistake from my side that I dint 
send the complete text. Did you mean that with the text I sent, the 
co-reference superscript-1 will be lost?

Also as per your advice, We have created an issue  - 
https://issues.apache.org/jira/browse/CTAKES-459  for measurement FSM changes 
and attached the modified file changes. Could someone have a look and know your 
thoughts please?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 28, 2017 8:21 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html
Creating patches:   
https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html
Attaching files:   
https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Thursday, September 28, 2017 4:09 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 
sample HTML using  piper GUI by  having only that single line - " The patient 
started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 
(days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. 
" in the input file.

But when I change the input file content with the following lines:

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilate

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-28 Thread Finan, Sean
Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".  
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html
Creating patches:   
https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html
Attaching files:   
https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Thursday, September 28, 2017 4:09 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 
sample HTML using  piper GUI by  having only that single line - " The patient 
started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 
(days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. 
" in the input file.

But when I change the input file content with the following lines:

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for deep vein 
thrombosis. This patient did not take Thalomid on the day of his admittance to 
the hospital, but resumed treatment shortly after with no return of symptoms. 
On 07/15/02, he was discharged in stable condition. There have been no further 
reports of bleeding at this time. Thedoctor has assessed the hematochezia as 
related to Coumadin treatment and previously diagnosed diverticulosis, and not 
to protocol therapy with Thalomid and Epirubicin.Additional information 
received from the investiga

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-27 Thread Finan, Sean
Hi Gandhi,

I am glad that you are feeling better.
I don't understand why you aren't getting the same output as me.  I just ran 
your example sentence with your piper with a fresh checkout and get the html 
below.  The css follows.  Copy and paste into a file and see if you see the 
corefs.

/  html, copy into file  
/




  OneLiner Output



OneLiner
 Text processing finished on: 9 27 2017, 08:15:31





The patient started study treatment of Thalomid 200mg1 ( days 1 
- 21 ) , and Epirubicin , 20 mg / m2 ( days 1 , 8 , 
and 15 ) on 06 / 07 / 02 for the 
treatment of hepatocellular carcinoma1 . 






 Annotation Information 

  function iaf(txt) {
var aff=txt.replace( /AFF_/g,"<br><h3>Affirmed</h3>" );
var neg=aff.replace( /NEG_/g,"<br><h3>Negated</h3>" );
var unc=neg.replace( /UNC_/g,"<br><h3>Uncertain</h3>" );
var unn=unc.replace( /UNN_/g,"<br><h3>Uncertain, Negated</h3>" );
var ant=unn.replace( /ANT/g,"<b>Anatomical Site</b>" );
var dis=ant.replace( /DIS/g,"<b>Disease/ Disorder</b>" );
var fnd=dis.replace( /FND/g,"<b>Sign/ Symptom</b>" );
var prc=fnd.replace( /PRC/g,"<b>Procedure</b>" );
var drg=prc.replace( /DRG/g,"<b>Medication</b>" );
var evt=drg.replace( /EVT/g,"<b>Event</b>" );
var tmx=evt.replace( /TMX/g,"<b>Time</b>" );
var unk=tmx.replace( /UNK/g,"<b>Unknown</b>" );
var spc=unk.replace( /SPC_/g,"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;" );
var prf1=spc.replace( /\[/g,"<i>" );
var prf2=prf1.replace( /\]/g,"</i>" );
var nl=prf2.replace( /NL_/g,"<br>" );
document.getElementById("ia").innerHTML = nl;
  }
  function crf1() {
document.getElementById("ia").innerHTML = "<br><h3>Coreference 
Chain</h3>study treatment of Thalomid 200mg<br>the treatment of hepatocellular 
carcinoma";
  }





/  css, copy into file 
named ctakes.pretty.css in same directory as html   
/



.GNR_ {
  position: relative;
  display: inline-block gray;
  border-bottom: 0.10em solid gray;
}

.AFF_ {
  position: relative;
  display: inline-block green;
  border-bottom: 0.15em solid green;
}

.UNC_ {
  position: relative;
  display: inline-block gold;
  border-bottom: 0.16em dotted gold;
}

.NEG_ {
  position: relative;
  display: inline-block red;
  border-bottom: 0.16em dashed red;
}

.UNN_ {
  position: relative;
  display: inline-block orange;
  border-bottom: 0.16em dashed orange;
}

.FND {
  color: magenta;
}

.DIS {
  color: black;
}

.DRG {
  color: red;
}

.PRC {
  color: blue;
}

.ANT {
  color: gray;
}

.UNK {
  color: gray;
}

[TIP] {
  position: relative;
  z-index: 2;
  cursor: pointer;
}
[TIP]::before,
[TIP]::after {
  visibility: hidden;
  -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=0)";
  filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=0);
  opacity: 0;
  pointer-events: none;
}
[TIP]::before {
  position: absolute;
  bottom: 0%;
  left: 100%;
  margin-bottom: 5px;
  padding: 7px;
  -webkit-border-radius: 3px;
  -moz-border-radius: 3px;
  border-radius: 3px;
  background-color: #000;
  background-color: hsla(0, 0%, 20%, 0.9);
  color: #fff;
  content: attr(TIP);
  text-align: center;
  font-size: 14px;
  line-height: 1.2;
}
[TIP]:hover::before,
[TIP]:hover::after {
  visibility: visible;
  -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=100)";
  filter: progid: DXImageTransform.Microsoft.Alpha(Opacity=100);
  opacity: 1;
}

div#ia {
  position: fixed;
  top: 0;
  right: 0;
  width: 20%;
  height: 100%;
  padding: 10px;
  overflow: auto;
  background-color: lightgray;
}

div#content {
  width: 79%;
  height: 100%;
  padding: 10px;
  overflow: auto;
}









-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Wednesday, September 27, 2017 4:40 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Sorry for the delayed response as I was out of office due to illness. If I 
don't add BackwardsTimeAnnotator, I don't see any error related to isTraining 
param. But still couldn't get the superscript co-reference working. Please note 
that I am using the latest 4.0.1 jars. The piper file and console log messages 
are as follows:

PIPER FILE:
// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragra

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-27 Thread Gandhi Rajan Natarajan
 measurements like '20 mg/m2' which was not identified out of the box.  
Should we send the code changes to you so that you can consider the same to be 
productized ? Please advise.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Friday, September 22, 2017 6:54 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

You don't need to add BackwardsTimeAnnotator to your piper.  It is added by the 
TemporalSubPipe.piper.  The  error that you are seeing regarding training is 
very strange, but you can try adding this line to the top of the file:
set isTraining=false

Can you run a sample file with your piper and send me the log statements?  It 
might help me figure out what is going on.

> is there any doc or guide on how to start writing our own annotator.
There are two example annotators in the ctakes-examples project under the ae/ 
directory.  You can look at those, but I recommend that you look at some 
information on Uimafit, which can be used to create new annotators:
https://uima.apache.org/d/uimafit-2.1.0/tools.uimafit.book.pdf
An introduction to creating Analysis Engines (Annotators) is on page 5.

Coding style is individualistic, but below is a rubberstamp that I use to get 
started:

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

/**
 * @author SPF , chip-nlp
 * @version %I%
 * @since 9/22/2017
 */
@PipeBitInfo(
  name = "Template",
  description = "For Example.", role = PipeBitInfo.Role.ANNOTATOR
)
final public class Template extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "Template" );

   /**
* {@inheritDoc}
*/
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
  // Always call the super first
  super.initialize( context );
  // place AE initialization code here
   }

   /**
* {@inheritDoc}
*/
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
  LOGGER.info( "Processing ..." );
  // Place AE processing code here
  LOGGER.info( "Finished." );
   }
}



If you use IntelliJ as your ide you can create a file template with these 
parameters:

#if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")package ${PACKAGE_NAME};#end

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

#parse("File Header.java")
@PipeBitInfo(
  name = "${NAME}",
  #if ( ${PROJECT_NAME} != "")description = "For ${PROJECT_NAME}.",#end
  role = PipeBitInfo.Role.ANNOTATOR
)
final public class ${NAME} extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "${NAME}" );

   /**
* {@inheritDoc}
*/
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
  // Always call the super first
  super.initialize( context );
  // place AE initialization code here
   }

   /**
* {@inheritDoc}
*/
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
  LOGGER.info( "Processing ..." );
  // Place AE processing code here
  LOGGER.info( "Finished." );
   }
}





-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Friday, September 22, 2017 2:23 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks again for the detailed response.

I still couldn't manage to get superscript-1 co-reference in piper GUI.  Also 
I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the 
below error:

org.apache.uima.resource.ResourceInitializationException: Initialization of 
annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed.  
(Descriptor: )
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
Caused by: java.lang.IllegalArgum

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-22 Thread Finan, Sean
Hi Gandhi,

You don't need to add BackwardsTimeAnnotator to your piper.  It is added by the 
TemporalSubPipe.piper.  The  error that you are seeing regarding training is 
very strange, but you can try adding this line to the top of the file:
set isTraining=false

Can you run a sample file with your piper and send me the log statements?  It 
might help me figure out what is going on.

> is there any doc or guide on how to start writing our own annotator.
There are two example annotators in the ctakes-examples project under the ae/ 
directory.  You can look at those, but I recommend that you look at some 
information on Uimafit, which can be used to create new annotators:
https://uima.apache.org/d/uimafit-2.1.0/tools.uimafit.book.pdf
An introduction to creating Analysis Engines (Annotators) is on page 5.

Coding style is individualistic, but below is a rubberstamp that I use to get 
started:

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

/**
 * @author SPF , chip-nlp
 * @version %I%
 * @since 9/22/2017
 */
@PipeBitInfo(
  name = "Template",
  description = "For Example.", role = PipeBitInfo.Role.ANNOTATOR
)
final public class Template extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "Template" );

   /**
* {@inheritDoc}
*/
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
  // Always call the super first
  super.initialize( context );
  // place AE initialization code here
   }

   /**
* {@inheritDoc}
*/
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
  LOGGER.info( "Processing ..." );
  // Place AE processing code here
  LOGGER.info( "Finished." );
   }
}



If you use IntelliJ as your ide you can create a file template with these 
parameters:

#if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")package ${PACKAGE_NAME};#end

import org.apache.ctakes.core.pipeline.PipeBitInfo;
import org.apache.log4j.Logger;
import org.apache.uima.UimaContext;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;

#parse("File Header.java")
@PipeBitInfo(
  name = "${NAME}",
  #if ( ${PROJECT_NAME} != "")description = "For ${PROJECT_NAME}.",#end
  role = PipeBitInfo.Role.ANNOTATOR
)
final public class ${NAME} extends JCasAnnotator_ImplBase {

   static private final Logger LOGGER = Logger.getLogger( "${NAME}" );
   
   /**
* {@inheritDoc}
*/
   @Override
   public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
  // Always call the super first
  super.initialize( context );
  // place AE initialization code here
   }

   /**
* {@inheritDoc}
*/
   @Override
   public void process( final JCas jCas ) throws AnalysisEngineProcessException 
{
  LOGGER.info( "Processing ..." );
  // Place AE processing code here
  LOGGER.info( "Finished." );
   }   
}





-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Friday, September 22, 2017 2:23 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks again for the detailed response.

I still couldn't manage to get superscript-1 co-reference in piper GUI.  Also 
I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the 
below error:

org.apache.uima.resource.ResourceInitializationException: Initialization of 
annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed.  
(Descriptor: )
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING 
- unable to infer it from context
at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109)

Somewhere in old mails it's mentioned that it's because of missing dependencies 
so I tried adding ClearTkAnnotator with no luck yet. My piper file is as 
follows:

load AdvancedTokenizerPipeline.piper
add ContextDependentTokenizerAnnotator
add POSTagger
load ChunkerSu

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-22 Thread Gandhi Rajan Natarajan
Hi Sean,

Thanks again for the detailed response.

I still couldn't manage to get superscript-1 co-reference in piper GUI.  Also 
I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the 
below error:

org.apache.uima.resource.ResourceInitializationException: Initialization of 
annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed.  
(Descriptor: )
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING 
- unable to infer it from context
at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109)

Somewhere in old mails it's mentioned that it's because of missing dependencies 
so I tried adding ClearTkAnnotator with no luck yet. My piper file is as 
follows:

load AdvancedTokenizerPipeline.piper
add ContextDependentTokenizerAnnotator
add POSTagger
load ChunkerSubPipe.piper
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
load AttributeCleartkSubPipe.piper
load RelationSubPipe.piper
load TemporalSubPipe.piper
load CorefSubPipe.piper
add org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator
add pretty.html.HtmlTextWriter
add FileTreeXmiWriter

Any suggestion on this? Also I'm using all the latest 4.0.1 cTAKES Jars. 
Regarding the identification of Names, will dig deep on what you have mentioned.

Sorry to ask this as you already mentioned that there are no detailed docs for 
cTAKES. But is there any doc or guide on how to start writing our own annotator 
if required? It not, Is there any simple annotator that you would suggest us to 
look into to get better understanding on annotators for us to proceed further.  
Thanks in advance.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 21, 2017 7:59 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

> We guess we are missing out on something as we could not find co-references 
> for "200mg". Should we add anymore piper for this?
The piper commands that I sent has everything to obtain coreferences.  I use it 
regularly - it is what I used on your example sentence to get the coreferences 
that I mentioned.

> Also the change mentioned in the thread ...
That is a very old thread and I don't think that it applies to what you are 
trying to do.

> We also have a requirement to identify the patient names and sex
As James said, ctakes isn't really meant to do this.  Ctakes is catered toward 
extracting clinical data, and to this point names have not fallen into that 
category.  It is more a task for general nlp.  There is an opennlp model that 
can identify names and a few others (I used to see names using GATE).  ctakes 
has wrapped opennlp for other tasks and you should be able to do the same to 
adapt an engine for names into ctakes.

> cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 
> 02 or 27Aug2002
As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on 
gold data.  It isn't perfect.  You can add another time annotator on top of 
this to get some of the more simply formatted date mentions - there are a lot 
of them out there.  Personally I have used jchronic as it can be easily tweaked 
to recognize medically-relevant temporal expressions relating to surgery, 
pharmacology, etc.

Sean


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 20, 2017 8:50 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.

Make sure that you are running the latest version in trunk.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Wednesday, September 20, 2017 7:03 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the t

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-20 Thread Finan, Sean
Hi Gandhi,

> We guess we are missing out on something as we could not find co-references 
> for "200mg". Should we add anymore piper for this?
The piper commands that I sent has everything to obtain coreferences.  I use it 
regularly - it is what I used on your example sentence to get the coreferences 
that I mentioned.

> Also the change mentioned in the thread ...
That is a very old thread and I don't think that it applies to what you are 
trying to do.

> We also have a requirement to identify the patient names and sex
As James said, ctakes isn't really meant to do this.  Ctakes is catered toward 
extracting clinical data, and to this point names have not fallen into that 
category.  It is more a task for general nlp.  There is an opennlp model that 
can identify names and a few others (I used to see names using GATE).  ctakes 
has wrapped opennlp for other tasks and you should be able to do the same to 
adapt an engine for names into ctakes.

> cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 
> 02 or 27Aug2002
As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on 
gold data.  It isn't perfect.  You can add another time annotator on top of 
this to get some of the more simply formatted date mentions - there are a lot 
of them out there.  Personally I have used jchronic as it can be easily tweaked 
to recognize medically-relevant temporal expressions relating to surgery, 
pharmacology, etc.

Sean


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Wednesday, September 20, 2017 8:50 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.  

Make sure that you are running the latest version in trunk.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Wednesday, September 20, 2017 7:03 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc=
  is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of dat

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-20 Thread Finan, Sean
That is very informative - Thanks Chen!  

-Original Message-
From: Lin, Chen [mailto:chen@childrens.harvard.edu] 
Sent: Wednesday, September 20, 2017 3:37 PM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

As for the error in EventTimeRelationAnnotator, the reason is that the
time-class attribute value for an temporal expression mention is missing.
When we develop this annotator, we used time-class in the gold annotation
as a feature to help the classifier. If this feature is missing, the
system can still predict event-time relation, but the performance will
drop a little. Our test on SemEval 2015 data shows if the temporal
attributes are missing, the system performance will drop 0.012 in F-score
(Table 4 of 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC5009920_=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=oQQvhPN8wZ_LuvLpAO3D_2-LZpC-Tv6WuPa91xNS-gw=JcBwFJ_L-dVY7Ncal1XDHE-7awOU7sA5_N2X1ij_ggI=
 ).

If you really want this time-class feature, please add
³BackwardTimeAnnotator² into your processing pipeline, which will annotate
temporal expressions and predict their time classes. Please keep in mind
that this annotator is not 100% accurate either.

Best,
Chen 

On 9/20/17, 2:43 PM, "Gandhi Rajan Natarajan"
<gandhi.natara...@arisglobal.com> wrote:

>Hi James & Sean, Thanks for your support.
>
>
>
>Regarding  point-1,  We don¹t have any database or metadata to get the
>name or sex information. Is it not possible to achieve in cTAKES by any
>other names?  If yes, what other approach will be feasible to implement
>this along with cTAKES as we need this info very much for our requirement.
>
>
>
>Regarding  point-2, I will have a check on what you have suggested. But
>dates analysis is not part of temporal module?  Do you mean to say that
>if we use drug ner module, ContextDependentTokenizerAnnotator will be
>overwritten for date identifications?  Also while using piper GUI to run
>the analysis, we could see the following message in the console:
>
>21 Sep 2017 00:08:04  INFO EventTimeRelationAnnotator - Starting
>processing ...
>
>Null value found in Feature(, )
>
>
>
>Could someone brief on this error and how to overcome it?
>
>
>
>
>
>Regards,
>
>Gandhi
>
>
>
>
>
>-Original Message-
>
>From: James Masanz [mailto:masanz.ja...@gmail.com]
>
>Sent: Wednesday, September 20, 2017 8:41 PM
>
>To: dev@ctakes.apache.org
>
>Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
>
>
>
>1) I would typically not use cTAKES for extracting patient names or sex.
>is there any database or metadata that you can get that information from?
>
>
>
>2) Dates are found by the ContextDependentTokenizerAnnotator, which uses
>DateFSM.java in package org.apache.ctakes.core.fsm.machine.
>
>I believe drug ner uses DateParser in org.apache.ctakes.core.util to
>interpret the date annotations. So you might need to modify both DateFSM
>and DateParser.
>
>
>
>
>
>
>
>On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan <
>gandhi.natara...@arisglobal.com> wrote:
>
>
>
>> Hi Sean,
>
>>
>
>> Thanks again for the detailed and prompt response. We were able to run
>
>> the piper GUI as per your advice. But in the output (The patient
>
>> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and
>
>> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the
>
>> treatment of hepatocellular carcinoma.), we were not able to find
>
>> superscript-1 as you mentioned earlier but could find superscript-2, 3
>
>> etc.  We guess we are missing out on something as we could not find
>
>> co-references for "200mg". Should we add anymore piper for this?
>
>>
>
>> Also the change mentioned in the thread -
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apach
>>e=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=PZ241CwYZ3Asza
>>TEBtM2wl3EcIjNNNeKX8q7N_mt-aI=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd
>>0s=xElCOx2UASgWtuWUmL3KouME2Jivc5P_7UaHxzdROBw= .
>
>> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+
>
>> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required
>
>> for the drug-ner module to identify drug-ner annotations.
>
>>
>
>> 1) We also have a requirement to identify the patient names and sex
>
>> available in narrative texts. Please let us know how to achieve the
>
>> same as its not identifying the proper nouns and the relationship with
>>t

Re: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Lin, Chen
Hi Gandhi,

As for the error in EventTimeRelationAnnotator, the reason is that the
time-class attribute value for an temporal expression mention is missing.
When we develop this annotator, we used time-class in the gold annotation
as a feature to help the classifier. If this feature is missing, the
system can still predict event-time relation, but the performance will
drop a little. Our test on SemEval 2015 data shows if the temporal
attributes are missing, the system performance will drop 0.012 in F-score
(Table 4 of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009920/).

If you really want this time-class feature, please add
³BackwardTimeAnnotator² into your processing pipeline, which will annotate
temporal expressions and predict their time classes. Please keep in mind
that this annotator is not 100% accurate either.

Best,
Chen 

On 9/20/17, 2:43 PM, "Gandhi Rajan Natarajan"
<gandhi.natara...@arisglobal.com> wrote:

>Hi James & Sean, Thanks for your support.
>
>
>
>Regarding  point-1,  We don¹t have any database or metadata to get the
>name or sex information. Is it not possible to achieve in cTAKES by any
>other names?  If yes, what other approach will be feasible to implement
>this along with cTAKES as we need this info very much for our requirement.
>
>
>
>Regarding  point-2, I will have a check on what you have suggested. But
>dates analysis is not part of temporal module?  Do you mean to say that
>if we use drug ner module, ContextDependentTokenizerAnnotator will be
>overwritten for date identifications?  Also while using piper GUI to run
>the analysis, we could see the following message in the console:
>
>21 Sep 2017 00:08:04  INFO EventTimeRelationAnnotator - Starting
>processing ...
>
>Null value found in Feature(, )
>
>
>
>Could someone brief on this error and how to overcome it?
>
>
>
>
>
>Regards,
>
>Gandhi
>
>
>
>
>
>-Original Message-
>
>From: James Masanz [mailto:masanz.ja...@gmail.com]
>
>Sent: Wednesday, September 20, 2017 8:41 PM
>
>To: dev@ctakes.apache.org
>
>Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
>
>
>
>1) I would typically not use cTAKES for extracting patient names or sex.
>is there any database or metadata that you can get that information from?
>
>
>
>2) Dates are found by the ContextDependentTokenizerAnnotator, which uses
>DateFSM.java in package org.apache.ctakes.core.fsm.machine.
>
>I believe drug ner uses DateParser in org.apache.ctakes.core.util to
>interpret the date annotations. So you might need to modify both DateFSM
>and DateParser.
>
>
>
>
>
>
>
>On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan <
>gandhi.natara...@arisglobal.com> wrote:
>
>
>
>> Hi Sean,
>
>>
>
>> Thanks again for the detailed and prompt response. We were able to run
>
>> the piper GUI as per your advice. But in the output (The patient
>
>> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and
>
>> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the
>
>> treatment of hepatocellular carcinoma.), we were not able to find
>
>> superscript-1 as you mentioned earlier but could find superscript-2, 3
>
>> etc.  We guess we are missing out on something as we could not find
>
>> co-references for "200mg". Should we add anymore piper for this?
>
>>
>
>> Also the change mentioned in the thread -
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apach
>>e=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=PZ241CwYZ3Asza
>>TEBtM2wl3EcIjNNNeKX8q7N_mt-aI=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd
>>0s=xElCOx2UASgWtuWUmL3KouME2Jivc5P_7UaHxzdROBw= .
>
>> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+
>
>> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required
>
>> for the drug-ner module to identify drug-ner annotations.
>
>>
>
>> 1) We also have a requirement to identify the patient names and sex
>
>> available in narrative texts. Please let us know how to achieve the
>
>> same as its not identifying the proper nouns and the relationship with
>>the patient?
>
>> Eg. "This male patient named Tom Hardy aged 35 years is participating
>
>> in a Non-IND study"
>
>>
>
>> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or
>
>> 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know
>
>> how to enhance the system to identify such date patterns.
>
>> E.g " On 20Aug02, the investigator noted that this patient was
>
>> suffering worsening fatigue 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Gandhi Rajan Natarajan
Hi James & Sean, Thanks for your support.

Regarding  point-1,  We don’t have any database or metadata to get the name or 
sex information. Is it not possible to achieve in cTAKES by any other names?  
If yes, what other approach will be feasible to implement this along with 
cTAKES as we need this info very much for our requirement.

Regarding  point-2, I will have a check on what you have suggested. But dates 
analysis is not part of temporal module?  Do you mean to say that if we use 
drug ner module, ContextDependentTokenizerAnnotator will be overwritten for 
date identifications?  Also while using piper GUI to run the analysis, we could 
see the following message in the console:
21 Sep 2017 00:08:04  INFO EventTimeRelationAnnotator - Starting processing ...
Null value found in Feature(, )

Could someone brief on this error and how to overcome it?


Regards,
Gandhi


-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com]
Sent: Wednesday, September 20, 2017 8:41 PM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]

1) I would typically not use cTAKES for extracting patient names or sex. is 
there any database or metadata that you can get that information from?

2) Dates are found by the ContextDependentTokenizerAnnotator, which uses  
DateFSM.java in package org.apache.ctakes.core.fsm.machine.
I believe drug ner uses DateParser in org.apache.ctakes.core.util to interpret 
the date annotations. So you might need to modify both DateFSM and DateParser.



On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < 
gandhi.natara...@arisglobal.com> wrote:

> Hi Sean,
>
> Thanks again for the detailed and prompt response. We were able to run
> the piper GUI as per your advice. But in the output (The patient
> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and
> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the
> treatment of hepatocellular carcinoma.), we were not able to find
> superscript-1 as you mentioned earlier but could find superscript-2, 3
> etc.  We guess we are missing out on something as we could not find
> co-references for "200mg". Should we add anymore piper for this?
>
> Also the change mentioned in the thread - http://mail-archives.apache.
> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+
> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required
> for the drug-ner module to identify drug-ner annotations.
>
> 1) We also have a requirement to identify the patient names and sex
> available in narrative texts. Please let us know how to achieve the
> same as its not identifying the proper nouns and the relationship with the 
> patient?
> Eg. "This male patient named Tom Hardy aged 35 years is participating
> in a Non-IND study"
>
> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or
> 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know
> how to enhance the system to identify such date patterns.
> E.g " On 20Aug02, the investigator noted that this patient was
> suffering worsening fatigue and got tired getting out of his chair"
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, September 18, 2017 10:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL]
>
> Hi Gandhi,
>
> > So in this case will be able to see drug attributes in the output XML?
> As long as you have the DrugMentionAnnotator in your pipeline you
> should be able to find drug attributes in the xml output file.
>
> > we also saw some code changes needs to be done to use drug-ner module.
> Is it still valid?
> As far as I know there aren't any necessary code changes to get drug
> ner running.  However, I do not normally use drugner so I can't say for 
> certain.
>
> > Also you mentioned that the drun-ner module is out of date
> It can still be used and will produce annotations.  All that I meant
> was that there may not be many people out there using it.  It is not
> part of the default pipeline.
>
>   > You also mentioned that when you run the sentence, the date was
> identified. Where and how exactly did you ran it so that we can check
> the same?
> I run the following in a piper file because I am interested in a lot
> of modules (I added drugner just for you):
>
> // Advanced Tokenization: Regex sectionization, BIO Sentence Detector
> (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add
> ContextDependentTokenizerAnnotator
> add POSTagger
> // Chunkers
> load ChunkerSubPipe.piper
> // Default fast dictionary lookup
> load DictionarySubPipe.piper
> add org

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Finan, Sean
Thanks James!

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Wednesday, September 20, 2017 11:11 AM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]

1) I would typically not use cTAKES for extracting patient names or sex. is
there any database or metadata that you can get that information from?

2) Dates are found by the ContextDependentTokenizerAnnotator, which uses
 DateFSM.java in package org.apache.ctakes.core.fsm.machine.
I believe drug ner uses DateParser in org.apache.ctakes.core.util to
interpret the date annotations. So you might need to modify both DateFSM
and DateParser.



On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com> wrote:

> Hi Sean,
>
> Thanks again for the detailed and prompt response. We were able to run the
> piper GUI as per your advice. But in the output (The patient started study
> treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 (
> days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular
> carcinoma.), we were not able to find superscript-1 as you mentioned
> earlier but could find superscript-2, 3 etc.  We guess we are missing out
> on something as we could not find co-references for "200mg". Should we add
> anymore piper for this?
>
> Also the change mentioned in the thread - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=0yix0VPTznOGjqsXh9VcGDn5yF7xI1Y2BJFHROP03xQ=Af6hYWtMcMTkGE6egTQNnz8ht9vAXF5hDoANXnR5mK8=
>  .
> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+
> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for
> the drug-ner module to identify drug-ner annotations.
>
> 1) We also have a requirement to identify the patient names and sex
> available in narrative texts. Please let us know how to achieve the same as
> its not identifying the proper nouns and the relationship with the patient?
> Eg. "This male patient named Tom Hardy aged 35 years is participating in a
> Non-IND study"
>
> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06
> / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to
> enhance the system to identify such date patterns.
> E.g " On 20Aug02, the investigator noted that this patient was suffering
> worsening fatigue and got tired getting out of his chair"
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, September 18, 2017 10:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
>
> Hi Gandhi,
>
> > So in this case will be able to see drug attributes in the output XML?
> As long as you have the DrugMentionAnnotator in your pipeline you should
> be able to find drug attributes in the xml output file.
>
> > we also saw some code changes needs to be done to use drug-ner module.
> Is it still valid?
> As far as I know there aren't any necessary code changes to get drug ner
> running.  However, I do not normally use drugner so I can't say for certain.
>
> > Also you mentioned that the drun-ner module is out of date
> It can still be used and will produce annotations.  All that I meant was
> that there may not be many people out there using it.  It is not part of
> the default pipeline.
>
>   > You also mentioned that when you run the sentence, the date was
> identified. Where and how exactly did you ran it so that we can check the
> same?
> I run the following in a piper file because I am interested in a lot of
> modules (I added drugner just for you):
>
> // Advanced Tokenization: Regex sectionization, BIO Sentence Detector
> (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add
> ContextDependentTokenizerAnnotator
> add POSTagger
> // Chunkers
> load ChunkerSubPipe.piper
> // Default fast dictionary lookup
> load DictionarySubPipe.piper
> add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
> // Cleartk Entity Attributes
> load AttributeCleartkSubPipe.piper
> // Relations
> load RelationSubPipe.piper
> // Temporal
> load TemporalSubPipe.piper
> // Coreferences
> load CorefSubPipe.piper
> // Html output
> add pretty.html.HtmlTextWriter
>
> For information on piper files, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=0yix0VPTznOGjqsXh9VcGDn5yF7xI1Y2BJFHROP03xQ=wyfCwy3BJ3yNagHr2eeHoApGRGrL26VxnlflMEJ1QuA=
>  
> con

Re: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread James Masanz
1) I would typically not use cTAKES for extracting patient names or sex. is
there any database or metadata that you can get that information from?

2) Dates are found by the ContextDependentTokenizerAnnotator, which uses
 DateFSM.java in package org.apache.ctakes.core.fsm.machine.
I believe drug ner uses DateParser in org.apache.ctakes.core.util to
interpret the date annotations. So you might need to modify both DateFSM
and DateParser.



On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com> wrote:

> Hi Sean,
>
> Thanks again for the detailed and prompt response. We were able to run the
> piper GUI as per your advice. But in the output (The patient started study
> treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 (
> days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular
> carcinoma.), we were not able to find superscript-1 as you mentioned
> earlier but could find superscript-2, 3 etc.  We guess we are missing out
> on something as we could not find co-references for "200mg". Should we add
> anymore piper for this?
>
> Also the change mentioned in the thread - http://mail-archives.apache.
> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+
> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required for
> the drug-ner module to identify drug-ner annotations.
>
> 1) We also have a requirement to identify the patient names and sex
> available in narrative texts. Please let us know how to achieve the same as
> its not identifying the proper nouns and the relationship with the patient?
> Eg. "This male patient named Tom Hardy aged 35 years is participating in a
> Non-IND study"
>
> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06
> / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to
> enhance the system to identify such date patterns.
> E.g " On 20Aug02, the investigator noted that this patient was suffering
> worsening fatigue and got tired getting out of his chair"
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, September 18, 2017 10:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
>
> Hi Gandhi,
>
> > So in this case will be able to see drug attributes in the output XML?
> As long as you have the DrugMentionAnnotator in your pipeline you should
> be able to find drug attributes in the xml output file.
>
> > we also saw some code changes needs to be done to use drug-ner module.
> Is it still valid?
> As far as I know there aren't any necessary code changes to get drug ner
> running.  However, I do not normally use drugner so I can't say for certain.
>
> > Also you mentioned that the drun-ner module is out of date
> It can still be used and will produce annotations.  All that I meant was
> that there may not be many people out there using it.  It is not part of
> the default pipeline.
>
>   > You also mentioned that when you run the sentence, the date was
> identified. Where and how exactly did you ran it so that we can check the
> same?
> I run the following in a piper file because I am interested in a lot of
> modules (I added drugner just for you):
>
> // Advanced Tokenization: Regex sectionization, BIO Sentence Detector
> (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add
> ContextDependentTokenizerAnnotator
> add POSTagger
> // Chunkers
> load ChunkerSubPipe.piper
> // Default fast dictionary lookup
> load DictionarySubPipe.piper
> add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
> // Cleartk Entity Attributes
> load AttributeCleartkSubPipe.piper
> // Relations
> load RelationSubPipe.piper
> // Temporal
> load TemporalSubPipe.piper
> // Coreferences
> load CorefSubPipe.piper
> // Html output
> add pretty.html.HtmlTextWriter
>
> For information on piper files, see https://cwiki.apache.org/
> confluence/display/CTAKES/Piper+Files
> I run it in my IDE with:
> org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p
> .piper -i org/apache/ctakes/examples/notes -o 
> --user  --pass  You can run it by command line by
> substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G"
> with "bin/runPiperFile".
> You can also run it through a ctakes 4.01 (trunk) gui.  See
> https://cwiki.apache.org/confluence/display/CTAKES/
> Piper+File+Submitter+GUI
>
> > I'm not able to see any clickable option in HTML output
> You must have the HtmlTextWriter at the end of your pipeline to produce
> html files.  To keep the xml file output, place "add FileTree

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Finan, Sean
Hi Gandhi,

No problem.

Can anybody else out there field at least some of this today?  I may not get to 
it until tomorrow.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Wednesday, September 20, 2017 9:53 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Thanks for the response Sean. Your help is really appreciated.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 20, 2017 6:20 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.

Make sure that you are running the latest version in trunk.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Wednesday, September 20, 2017 7:03 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc=
  is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
htt

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Gandhi Rajan Natarajan
Thanks for the response Sean. Your help is really appreciated.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Wednesday, September 20, 2017 6:20 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.

Make sure that you are running the latest version in trunk.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Wednesday, September 20, 2017 7:03 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc=
  is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo=
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper 
-i org/ap

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Finan, Sean
Hi Gandhi,

I don't have time to go through all of this right now, but I will try to get to 
it soon.  

Make sure that you are running the latest version in trunk.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Wednesday, September 20, 2017 7:03 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc=
  is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo=
 
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper 
-i org/apache/ctakes/examples/notes -o  --user  --pass 
 You can run it by command line by substituting 
"org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with 
"bin/runPiperFile".
You can also run it through a ctakes 4.01 (trunk) gui.  See 
https://urldefense.proofpoint.com/v2/url?u=

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-20 Thread Gandhi Rajan Natarajan
Hi, Could someone help me out on the below queries please?

Regards,
Gandhi

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, September 19, 2017 8:51 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E
 is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper 
-i org/apache/ctakes/examples/notes -o  --user  --pass 
 You can run it by command line by substituting 
"org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with 
"bin/runPiperFile".
You can also run it through a ctakes 4.01 (trunk) gui.  See 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI

> I'm not able to see any clickable option in HTML output
You must have the HtmlTextWriter at the end of your pipeline to produce html 
files.  To keep the xml file output, place "add FileTreeXmiWriter" at the end 
of the piper.

> Apologizes for too many
No worries, we are happy to have your interest!

Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Saturday, September 16, 2017 7:01 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the prompt response. Appreciate your input on adding 
DrugMentionAnnotator. Actually, we are relying on pretty printer output just to 
understand the analysis. Our logic to extract disorders and findings are based 
on the XML file generated by 
https://urldefense.p

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-19 Thread Gandhi Rajan Natarajan
Hi Sean,

Thanks again for the detailed and prompt response. We were able to run the 
piper GUI as per your advice. But in the output (The patient started study 
treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 
1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular 
carcinoma.), we were not able to find superscript-1 as you mentioned earlier 
but could find superscript-2, 3 etc.  We guess we are missing out on something 
as we could not find co-references for "200mg". Should we add anymore piper for 
this?

Also the change mentioned in the thread - 
http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E
 is required for the drug-ner module to identify drug-ner annotations.

1) We also have a requirement to identify the patient names and sex available 
in narrative texts. Please let us know how to achieve the same as its not 
identifying the proper nouns and the relationship with the patient?
Eg. "This male patient named Tom Hardy aged 35 years is participating in a 
Non-IND study"

2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 
/ 02 or 27Aug2002 as in the below example. Please let us know how to enhance 
the system to identify such date patterns.
E.g " On 20Aug02, the investigator noted that this patient was suffering 
worsening fatigue and got tired getting out of his chair"

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, September 18, 2017 10:02 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists load AdvancedTokenizerPipeline.piper add 
ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper 
-i org/apache/ctakes/examples/notes -o  --user  --pass 
 You can run it by command line by substituting 
"org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with 
"bin/runPiperFile".
You can also run it through a ctakes 4.01 (trunk) gui.  See 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI

> I'm not able to see any clickable option in HTML output
You must have the HtmlTextWriter at the end of your pipeline to produce html 
files.  To keep the xml file output, place "add FileTreeXmiWriter" at the end 
of the piper.

> Apologizes for too many
No worries, we are happy to have your interest!

Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Saturday, September 16, 2017 7:01 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the prompt response. Appreciate your input on adding 
DrugMentionAnnotator. Actually, we are relying on pretty printer output just to 
understand the analysis. Our logic to extract disorders and findings are based 
on the XML file generated by 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI=g8UzBHR

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-18 Thread Finan, Sean
Hi Gandhi,

> So in this case will be able to see drug attributes in the output XML?
As long as you have the DrugMentionAnnotator in your pipeline you should be 
able to find drug attributes in the xml output file.

> we also saw some code changes needs to be done to use drug-ner module. Is it 
> still valid?
As far as I know there aren't any necessary code changes to get drug ner 
running.  However, I do not normally use drugner so I can't say for certain.

> Also you mentioned that the drun-ner module is out of date
It can still be used and will produce annotations.  All that I meant was that 
there may not be many people out there using it.  It is not part of the default 
pipeline.

  > You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same?
I run the following in a piper file because I am interested in a lot of modules 
(I added drugner just for you):

// Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), 
Paragraphs, Lists
load AdvancedTokenizerPipeline.piper
add ContextDependentTokenizerAnnotator
add POSTagger
// Chunkers
load ChunkerSubPipe.piper
// Default fast dictionary lookup
load DictionarySubPipe.piper
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator
// Cleartk Entity Attributes
load AttributeCleartkSubPipe.piper
// Relations
load RelationSubPipe.piper
// Temporal
load TemporalSubPipe.piper
// Coreferences
load CorefSubPipe.piper
// Html output
add pretty.html.HtmlTextWriter

For information on piper files, see 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
I run it in my IDE with:
org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p .piper 
-i org/apache/ctakes/examples/notes -o  --user  --pass 

You can run it by command line by substituting 
"org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with 
"bin/runPiperFile".
You can also run it through a ctakes 4.01 (trunk) gui.  See 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI

> I'm not able to see any clickable option in HTML output
You must have the HtmlTextWriter at the end of your pipeline to produce html 
files.  To keep the xml file output, place "add FileTreeXmiWriter" at the end 
of the piper.

> Apologizes for too many
No worries, we are happy to have your interest!

Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Saturday, September 16, 2017 7:01 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi Sean,

Thanks again for the prompt response. Appreciate your input on adding 
DrugMentionAnnotator. Actually, we are relying on pretty printer output just to 
understand the analysis. Our logic to extract disorders and findings are based 
on the XML file generated by 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI=g8UzBHRoOyn1hoRABKSC6EtPMvwOSSggviRmWCHKti4=
   So in this case will be able to see drug attributes in the output XML?

In one of the old post 
(https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI=iT_1UGR98APO80UaZsaCBHseMqF4M4PfItgokD27r5c=
  ) we also saw some code changes needs to be done to use drug-ner module. Is 
it still valid? Also you mentioned that the drun-ner module is out of date 
which means it cannot be used or it may not provide accurate analysis? Also 
what changes needs to be done to bring it up to date so that we can try the 
same if you can assist?

You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same? Also 
regarding you explanation on corefernce, I'm not able to see any clickable 
option in HTML output. So wanted to understand how can we run and check that 
too.

Apologizes for too many questions as we are just a week old in NLP and cTAKES. 
Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified th

RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-16 Thread Gandhi Rajan Natarajan
Hi Sean,

Thanks again for the prompt response. Appreciate your input on adding 
DrugMentionAnnotator. Actually, we are relying on pretty printer output just to 
understand the analysis. Our logic to extract disorders and findings are based 
on the XML file generated by 
https://github.com/healthnlp/examples/blob/master/ctakes-temporal-demo/src/main/java/org/apache/ctakes/web/client/servlet/DemoServlet.java
  So in this case will be able to see drug attributes in the output XML?

In one of the old post 
(http://mail-archives.apache.org/mod_mbox/ctakes-user/201403.mbox/%3ccal6wimrj_mm1+xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E
 ) we also saw some code changes needs to be done to use drug-ner module. Is it 
still valid? Also you mentioned that the drun-ner module is out of date which 
means it cannot be used or it may not provide accurate analysis? Also what 
changes needs to be done to bring it up to date so that we can try the same if 
you can assist?

You also mentioned that when you run the sentence, the date was identified. 
Where and how exactly did you ran it so that we can check the same? Also 
regarding you explanation on corefernce, I'm not able to see any clickable 
option in HTML output. So wanted to understand how can we run and check that 
too.

Apologizes for too many questions as we are just a week old in NLP and cTAKES. 
Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.


RE: Enabling drugner pipeline and identifying dates [EXTERNAL]

2017-09-15 Thread Finan, Sean
Hi Gandhi,   (Hi Tim, find below the best coref chain I have ever seen),

Unfortunately, it looks like the drug-ner module has not been kept up-to-date.  
I just checked the cpe xml files and they contain invalid pointers.  Anyway, 
you should be able to add the DrugMentionAnnotator by using:

AggregateBuilder (code):
aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( 
DrugMentionAnnotator.class ) );

Piper file:
add org.apache.ctakes.drugner.ae.DrugMentionAnnotator

Unfortunately, the drug attribute types all extend the type Annotation.   The 
PrettyTextWriter that you are using only marks IdentifiedAnnotation subtypes, 
so you will not see the drug attributes without writing some extra code.  On 
that matter, I recommend that you use HtmlTextWriter for output as it provides 
more information in a nicer format - though still not drug ner attributes.  
One nice feature is the markup of coreferences.  Using your example sentence:
"The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and 
Epirubicin , 20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the 
treatment of hepatocellular carcinoma." 
It marks a superscript '1' (coreference chain #1) after "200mg" and "carcinoma" 
because Tim's excellent coreference model connected:
"study treatment of Thalomid 200mg"  with "the treatment of hepatocellular 
carcinoma"!
If you click one of the superscript "4"s it will display the coreference chain 
in the margin.
I am still working on that writer in my spare time, so if you have suggestions 
please let me know.

As for the missing times, I don't know what you are witnessing.  When I run 
your sentence I get the times:
"days"
"days 1,8"
"06/07/02"(contains treatment)
The "days" aren't perfect, but the "06/07/02" date and its "contains treatment" 
relation are pretty good.

Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Friday, September 15, 2017 12:40 PM
To: dev@ctakes.apache.org
Subject: Enabling drugner pipeline and identifying dates [EXTERNAL]

Hi All,

We are using the pipeline code as mentioned in 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_Pipeline.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=yzJCkloh5MR6n2JJ5haAmB4_MQed5JDZnn01SFotO9c=CZBlVpS2hKfCLyBRrR_D4KKCAtF2ru6qf6HHtV7HnCs=
  for the cTAKES web application we are building. But in our case, the 
measurements and quantities are identified as events as shown below:

SENTENCE:  The patient started study treatment of Thalomid 200mg (days 1-21), 
and Epirubicin, 20 mg /m2  (days 1, 8, and 15) on 06/07/02 for the treatment of 
hepatocellular carcinoma.
   DTNN  VBDNN  NN INNN NNS   NNS 
CC NNP NNS NNS  NNSCC  IN  IN  DT NN IN 
  JJ  NN
   |=|   |===||==| |===|
  || |===|  
 |===|
EventProcedure  Drug   Event
 DrugProcedure  
 Disorder
 C0087111 C0723668  
   C0014582  C0087111   
 C0007097

  
|==|


  Disorder


  C2239176

>From googling what we have found out is that we need to use 
>DrugMentionAnnotator to identify measurements and quantities. Are we right? If 
>so, how do we enable DrugMentionAnnotator in our code. Could someone provide a 
>sample code snippet and help us out on this?

Also the dates are not getting identified in our case as we get the following 
error in our console even after using latest temporal resources (model.jar) as 
per Sean's suggestion :

"Null value found in Feature(, ) from [Feature(, 
), Feature(, )"

Could someone throw some light on this as well?

Thanks in advance.

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or