building cTAKES (discussion transferred from CTAKES-445

2017-10-03 Thread James Masanz
A question was asked within JIRA issue CTAKES-445
 about building cTAKES
that is more general than the topic of CTAKES-445, so I'm transferring that
to this mailing list. It started with the following question

how someone is able to provide complete Apache cTakes 4.0 binaries @
http://archive.apache.org/dist/ctakes/ctakes-4.0.0/apache-ctakes-4.0.0-bin.tar.gz
while
we struggle to build it from official Apache repository because of issues
like this one [CTAKES-445 
]


If you are trying to build a binary of cTAKES, I suggest you follow
instructions from the  cTAKES 4.0 Developer Install Guide

to
get a copy of cTAKES from trunk, but when checking out the source, be sure
to specify the revision you are interested in. By checking out from trunk,
you will get pom files that have a SNAPSHOT version.

Then use the command line version of maven to do something like the
following
mvn clean install -DskipTests=true
You should find the binaries have been built somewhere under
ctakes-distribution

-- James


RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Excellent, thanks

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Tuesday, October 03, 2017 12:35 PM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

FWIW, I started taking a look at the patch. (It's in code that I'm not that
familiar with, so a quick glance isn't sufficient for me.)
I did a search in UMLS for m2 in the terminologies commonly used by cTAKES
to see if adding m2 could result in marking something as a measurement when
it's not - and I did find many terms in the UMLS that contain m2. There are
plenty of other measurement abbreviations that also appear within other
terms, so it's not a showstopper - but is a consideration.

I haven't tested the patch yet to see if the way the patch is implemented -
checking for 2 tokens - avoids that issue.  Not sure if I'll get a chance
to look more this week. if you end up picking up looking at it Sean, at
least you know what I've done.

-- James


On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Ctakes is a purely volunteer effort, so there are never any guarantees ...
> If nobody looks at the value and unit jira and patch this week then I will
> try to get to it asap.
>
> Thanks for letting us use your example note!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Tuesday, October 03, 2017 12:21 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
> Hi Sean,
>
>
>
> Will this JIRA issue - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=g0Z49i4_khuoIF0p79Jh8zvJezinR7Dq_t3WlP_e2v4=nT_lkeizLaakNLeV829Pl1rOGdbGrldsns0j2o2MNOQ=
>  .
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_
> Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=   be
> looked up by someone as Tim mentioned?
>
>
>
> The paragraph we sent earlier can be in the example notes provided the
> protocol number is masked/modified.
>
>
>
> Regards,
>
> Gandhi
>
>
>
>
>
> -Original Message-
>
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
>
> Sent: Tuesday, October 03, 2017 7:27 PM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Gandhi,
>
>
>
> Thank you for asking.  There is no action item for you concerning the
> coreference output that you see.   However, if you would like to help the
> community understand how the module works (input and output), maybe you
> could do something like run the pipeline on your original sentence, then
> that sentence plus another (before), then that sentence plus another
> (after) ... and see how the output changes with the input.  If you take
> screenshots or something then we could put them on the wiki.  Also, would
> you mind if the paragraph you sent became one of the example notes in
> ctakes?  That means that it would be redistributed with the code.
>
>
>
> Sean
>
>
>
> -Original Message-
>
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>
> Sent: Tuesday, October 03, 2017 4:26 AM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Tim/Sean,
>
>
>
>
>
>
>
> Is this an action item on us? If yes, Could someone give us some valid
> inputs to test the same? Is someone else going to review this again?
>
>
>
>
>
>
>
> Regards,
>
>
>
> Gandhi
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
>
>
>
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>
>
>
> Sent: Monday, October 02, 2017 8:06 PM
>
>
>
> To: dev@ctakes.apache.org
>
>
>
> Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
>
>
>
>
> My bad, I didn't read too closely and thought this was going to be a
> coreference patch. I don't know this FSM code that well, so I am not an
> expert. My biggest concern at a glance is that these additions help find
> more true positives (as in your examples), can we verify that they won't
> create false positives?
>
>
>
> Tim
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
>
>
>
> > Hi Sean,
>
>
>
> >
>
>
>
> > Thanks again for the response. I guess its mistake from my side that I
>
>
>
> > dint send the complete text. Did you mean that with the text I sent,
>
>
>
> > the co-reference superscript-1 will be lost?
>
>
>
> >
>
>
>
> > Also as per your advice, We have created an issue  -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=
> 

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread James Masanz
FWIW, I started taking a look at the patch. (It's in code that I'm not that
familiar with, so a quick glance isn't sufficient for me.)
I did a search in UMLS for m2 in the terminologies commonly used by cTAKES
to see if adding m2 could result in marking something as a measurement when
it's not - and I did find many terms in the UMLS that contain m2. There are
plenty of other measurement abbreviations that also appear within other
terms, so it's not a showstopper - but is a consideration.

I haven't tested the patch yet to see if the way the patch is implemented -
checking for 2 tokens - avoids that issue.  Not sure if I'll get a chance
to look more this week. if you end up picking up looking at it Sean, at
least you know what I've done.

-- James


On Tue, Oct 3, 2017 at 12:25 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Ctakes is a purely volunteer effort, so there are never any guarantees ...
> If nobody looks at the value and unit jira and patch this week then I will
> try to get to it asap.
>
> Thanks for letting us use your example note!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Tuesday, October 03, 2017 12:21 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
> Hi Sean,
>
>
>
> Will this JIRA issue - https://urldefense.proofpoint.
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_
> Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=   be
> looked up by someone as Tim mentioned?
>
>
>
> The paragraph we sent earlier can be in the example notes provided the
> protocol number is masked/modified.
>
>
>
> Regards,
>
> Gandhi
>
>
>
>
>
> -Original Message-
>
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
>
> Sent: Tuesday, October 03, 2017 7:27 PM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Gandhi,
>
>
>
> Thank you for asking.  There is no action item for you concerning the
> coreference output that you see.   However, if you would like to help the
> community understand how the module works (input and output), maybe you
> could do something like run the pipeline on your original sentence, then
> that sentence plus another (before), then that sentence plus another
> (after) ... and see how the output changes with the input.  If you take
> screenshots or something then we could put them on the wiki.  Also, would
> you mind if the paragraph you sent became one of the example notes in
> ctakes?  That means that it would be redistributed with the code.
>
>
>
> Sean
>
>
>
> -Original Message-
>
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>
> Sent: Tuesday, October 03, 2017 4:26 AM
>
> To: dev@ctakes.apache.org
>
> Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
> Hi Tim/Sean,
>
>
>
>
>
>
>
> Is this an action item on us? If yes, Could someone give us some valid
> inputs to test the same? Is someone else going to review this again?
>
>
>
>
>
>
>
> Regards,
>
>
>
> Gandhi
>
>
>
>
>
>
>
>
>
>
>
> -Original Message-
>
>
>
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
>
>
>
> Sent: Monday, October 02, 2017 8:06 PM
>
>
>
> To: dev@ctakes.apache.org
>
>
>
> Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL]
> [SUSPICIOUS]
>
>
>
>
>
>
>
> My bad, I didn't read too closely and thought this was going to be a
> coreference patch. I don't know this FSM code that well, so I am not an
> expert. My biggest concern at a glance is that these additions help find
> more true positives (as in your examples), can we verify that they won't
> create false positives?
>
>
>
> Tim
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
>
>
>
> > Hi Sean,
>
>
>
> >
>
>
>
> > Thanks again for the response. I guess its mistake from my side that I
>
>
>
> > dint send the complete text. Did you mean that with the text I sent,
>
>
>
> > the co-reference superscript-1 will be lost?
>
>
>
> >
>
>
>
> > Also as per your advice, We have created an issue  -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=
> qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=
> sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=
> STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=
>
>
>
> > se.proofpoint.com/v2/url?u=https-
>
>
>
> > 3A__issues.apache.org_jira_browse_CTAKES-
>
>
>
> > 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
>
>
>
> > IbsIg9Q1TPOylpP9FE4GTK-
>
>
>
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
>
>
>
> > 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi, 
I have one discovery pertaining to the coref items so far.
Your first coreference (#1) is not appearing in the html because it consists 
only of a "generic" item: "this patient".
Coreference: This patient , This patient , This patient , this patient , this 
patient , this patient , this patient
This is a bug in the html writer that I will need to fix.
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=
>  

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy 

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes and enhancements.  This is a

> widely used process and is fairly easy to follow.   I think that the

> following links are relevant:

> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http

> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-

> 2Dissues-

> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe

> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU=

> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-

> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-

> 2Dpre-2Dcommit-2Dreviews-

> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe

> 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi,

Ctakes is a purely volunteer effort, so there are never any guarantees ...
If nobody looks at the value and unit jira and patch this week then I will try 
to get to it asap.

Thanks for letting us use your example note!

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 12:21 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,



Will this JIRA issue - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=EPRi2YznX0T5F4yYV0y2OmCxU0Q_Gx24B_omGRWF8kg=fhwLqbd8Tgg6z-jFe9Z7t0baNz2YgNwM-SCSeTnrZes=
   be looked up by someone as Tim mentioned?



The paragraph we sent earlier can be in the example notes provided the protocol 
number is masked/modified.



Regards,

Gandhi





-Original Message-

From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

Sent: Tuesday, October 03, 2017 7:27 PM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Gandhi,



Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.



Sean



-Original Message-

From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]

Sent: Tuesday, October 03, 2017 4:26 AM

To: dev@ctakes.apache.org

Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



Hi Tim/Sean,







Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?







Regards,



Gandhi











-Original Message-



From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]



Sent: Monday, October 02, 2017 8:06 PM



To: dev@ctakes.apache.org



Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]







My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?



Tim











On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:



> Hi Sean,



>



> Thanks again for the response. I guess its mistake from my side that I



> dint send the complete text. Did you mean that with the text I sent,



> the co-reference superscript-1 will be lost?



>



> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=



> se.proofpoint.com/v2/url?u=https-



> 3A__issues.apache.org_jira_browse_CTAKES-



> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-



> IbsIg9Q1TPOylpP9FE4GTK-



> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g



> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for



> measurement FSM changes and attached the modified file changes. Could



> someone have a look and know your thoughts please?



>



> Regards,



> Gandhi



>



>



> -Original Message-



> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]



> Sent: Thursday, September 28, 2017 8:21 PM



> To: dev@ctakes.apache.org



> Cc: Miller, Timothy 



> Subject: RE: Enabling drugner pipeline and identifying dates



> [EXTERNAL] [SUSPICIOUS]



>



> Hi Gandhi,



>



> I don't recall you sending me that entire snippet of text.  I think



> that I only had your single example sentence.



> You have discovered one of the quirks of software: "change the data,



> change the result."



> Ctakes is a system with many moving parts.  Things that precede or



> follow your original example sentence will change the evaluation of



> that sentence.



> With the pipeline you are using and the full note, you should see a



> number (mine is 4) next to the first "thalomid" in the original



> example sentence.  If you click that number you 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Gandhi Rajan Natarajan
Hi Sean,

Will this JIRA issue - https://issues.apache.org/jira/browse/CTAKES-459  be 
looked up by someone as Tim mentioned?

The paragraph we sent earlier can be in the example notes provided the protocol 
number is masked/modified.

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Tuesday, October 03, 2017 7:27 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy 

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes 

Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Jeff Headley
That's great Sean. Thanks for all the help.

On Tue, Oct 3, 2017 at 9:37 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> You can find all kinds of background information on the web with a search
> like "nlp tokenization".  You can look at 
> org.apache.ctakes.gui.dictionary.util.TextTokenizer
> in the ctakes-gui module to see how the dictionary creator does it.  You
> can run .getTokenizedText( text ) to get a tokenized string or .getTokens(
> text ) to get a list of words.  Apparently I was lazy and didn't write
> javadocs ...
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Tuesday, October 03, 2017 9:19 AM
> To: dev@ctakes.apache.org
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> Thanks Sean. Not quite, sorry for the confusion. We keep the default
> dictionary hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI
> tables and move over data from a sql server db. I don't seem to recall
> doing anything with a tcount column. I'll have to check our code tonight.
> That could very well be it. So maybe the old ctakes had a bug and this
> should not have been working to begin with. Got anywhere I could read about
> the tokenizing rules and calculating the tcount value? Or maybe a java
> class I could look at?
>
> Jeff
>
>
> On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Ok, let me see if I understand your current setup:
> >
> > Ctakes 4.0 fast lookup,
> > Dictionary configuration file points to an sql server, Sql server uses
> > cui_terms  (cui, rword, rindex, tcount, text) and perhaps other
> > secondary tables ...
> >
> > Now that I write out the column names, I have a thought.  Is it
> > possible that for some term the number in tcount does not match the
> > number of non-whitespace 'words' in the text column?  If those numbers
> > are off then you will have problems similar to the one that you are
> seeing.
> > If you are populating your own table you need to make sure that the
> > text is being properly tokenized.  For instance, the term "alpha-beta"
> > should have text "alpha - beta" with tcount 3.  There are some
> > exceptions to the dash -separation rule and a few oddities.
> >
> > Sean
> >
> > -Original Message-
> > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > Sent: Tuesday, October 03, 2017 8:52 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> >
> > I updated our pom to use the same hsqldb version as what I saw in the
> > ctakes lib folder. The data coming in is from a SQL Server database.
> >
> > On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > I don't think that a custom dictionary should cause a null pointer
> > > exception on that line unless you have an odd null character in text
> > > or something of that ilk.
> > >
> > > One thing that changed in ctakes 4.0 is the version of hsqldb that
> > > is being used for the dictionary database.  I don’t know if that has
> > > anything to do with your problem, but it may be causing others.
> > > What is the source of your custom dictionary?  There may be a better
> > > way to populate a database.
> > >
> > > Sean
> > >
> > > -Original Message-
> > > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > > Sent: Tuesday, October 03, 2017 12:53 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator
> > > [EXTERNAL]
> > >
> > > Thank you Sean. That helped to figure out what we did. Not quite
> > > sure where we went wrong but now at least we know the cause. So a
> > > long time ago in our project using ctakes, we emptied out the tables
> > > CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the
> > > values we wanted. Worked great. Now in the new version the
> > > /desc/ctakes-clinical-
> > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xm
> > > l
> > > engine seems to be
> > > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > > 16ab/sno_rx_16ab
> > > and that seems to be where things went sideways. If I don't mess
> > > with the db and keep the original, no errors.
> > >
> > > So somewhere in this if statement at line 102 in
> > DefaultJCASTermAnnotator:
> > > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
> > >   || hitTokens[ hit ].equals( allTokens.get( i
> > > ).getVariant() )
> > > ) {
> > >
> > > It's expecting to not ever have a null and I suspect we are leaving
> > > something null somewhere that really shouldn't have nulls. If it's
> > > obvioius where I've went wrong, the assistance would be appreciated.
> > > Otherwise, I'll get it figured out eventually. I suspect it's
> > > possibly because we never did anything with the SNOMEDCT_US in the
> prior version.
> > >
> > > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean <
> > > 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Thanks Tim!  I was looking for that one but couldn't find it.

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, October 03, 2017 10:03 AM
To: dev@ctakes.apache.org
Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

Here's the most recent publication, which describes the system in

ctakes 4.0 and later:

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.com_science_article_pii_S1532046417300850=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=L05lBYR93doAn-IsnZW2HMb7Ev0Y_82_0CpE3FYzpEA=GohiPyZbSEWfBjnOtC6x3UNnzv-fOBTnPFaIBUnVjm8=
 

Tim



On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:

> > 

> > With the changes in Input, the co-reference between all the

> > entities should still be preserved right?

> No.  One of the experts can better explain this, but the coreference

> module works with "best match" chains.  With one sentence of text,

> term (Markable) A may have a best match with term B.  As soon as you

> add more text, you introduce the possibility that term A will have a

> better best match with C and/or D, and the previous match to B will

> be deemed less accurate and dropped.  

> In your case the coreference A - B seems to be lost in favor of one

> using internal term A', and that is a little strange.  It could be

> that overlapping markables are being discarded?  I will try to look

> into this really quickly.

> 

> You can look at some publications on coref if you search the

> web.  The one that probably best applies to the current coref module

> (Tim, Dima, is this true?) is

> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a

> nthology_W12-

> 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl

> HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik= 

> 

> Sean

> 

> -Original Message-

> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]

>  

> Sent: Tuesday, October 03, 2017 4:18 AM

> To: dev@ctakes.apache.org

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

> 

> Hi Sean, I still have some doubts on this. If I run the piper file

> with the complete text I sent earlier, I could see only superscript -

> 4 for Thalomid and the co-reference of this to  "treatment of

> hepatocellular carcinoma" is still lost. Also I don’t see any

> superscript with number-1 too. With the changes in Input, the co-

> reference between all the entities should still be preserved right?

> Do we have any more info or doc on this co-reference module to

> understand its complexity better?

> 

> Regards,

> Gandhi

> 

> 

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Monday, October 02, 2017 8:36 PM

> To: dev@ctakes.apache.org

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

> 

> Hi Tim,

> 

> The coreference question (just a question) was for a different item

> altogether.  Sorry for any confusion.  The reason that I CC:d you ...

> 

> From Gandhi:

> > 

> > Interestingly even I was able to generate [Sean's coref output]

> > using  piper GUI by  having only that single line - " The patient

> > started study treatment of Thalomid 200mg (days 1-21), and

> > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the

> > treatment of hepatocellular carcinoma. " in the input file.

> > But when I change the input file content with the following

> > lines:   [Full paragraph (below), single-sentence in middle]  The

> > co-reference superscript is lost by then.

> Sean's answer:

> > 

> > Ctakes is a system with many moving parts.  Things that precede or

> > follow your original example sentence will change the evaluation of

> > that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> > 

> > Tim can correct me here, but maybe the coreference module ranked

> > the links between "thalomid" as much higher than the rank between

> > "study treatment of thalomid 200mg" and "the treatment of

> > hepatocellular carcinoma" and discarded the encapsulating treatment

> > texts from markables?  It is probably more complex than that.

> Sean

> 

> "This patient is participating in a Non-IND study; Protocol CG-

> 000424: "Phase I/II of Thalidomide and Epirubicin in Patients with

> Unresectable or Metastatic Hepatocellular Carcinoma".Information has

> been received from the investigator regarding an 82 year-old male

> patient who had gastrointestinal bleeding 

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Alexandru Zbarcea
This is very informative. Thank you Tim

Alex

On Oct 3, 2017 10:06, "Miller, Timothy" <
timothy.mil...@childrens.harvard.edu> wrote:

> Here's the most recent publication, which describes the system in
> ctakes 4.0 and later:
> http://www.sciencedirect.com/science/article/pii/S1532046417300850
> Tim
>
> On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:
> > >
> > > With the changes in Input, the co-reference between all the
> > > entities should still be preserved right?
> > No.  One of the experts can better explain this, but the coreference
> > module works with "best match" chains.  With one sentence of text,
> > term (Markable) A may have a best match with term B.  As soon as you
> > add more text, you introduce the possibility that term A will have a
> > better best match with C and/or D, and the previous match to B will
> > be deemed less accurate and dropped.
> > In your case the coreference A - B seems to be lost in favor of one
> > using internal term A', and that is a little strange.  It could be
> > that overlapping markables are being discarded?  I will try to look
> > into this really quickly.
> >
> > You can look at some publications on coref if you search the
> > web.  The one that probably best applies to the current coref module
> > (Tim, Dima, is this true?) is
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a
> > nthology_W12-
> > 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl
> > HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik=
> >
> > Sean
> >
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> >
> > Sent: Tuesday, October 03, 2017 4:18 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Enabling drugner pipeline and identifying dates
> > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> >
> > Hi Sean, I still have some doubts on this. If I run the piper file
> > with the complete text I sent earlier, I could see only superscript -
> > 4 for Thalomid and the co-reference of this to  "treatment of
> > hepatocellular carcinoma" is still lost. Also I don’t see any
> > superscript with number-1 too. With the changes in Input, the co-
> > reference between all the entities should still be preserved right?
> > Do we have any more info or doc on this co-reference module to
> > understand its complexity better?
> >
> > Regards,
> > Gandhi
> >
> >
> > -Original Message-
> > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> > Sent: Monday, October 02, 2017 8:36 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: Enabling drugner pipeline and identifying dates
> > [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> >
> > Hi Tim,
> >
> > The coreference question (just a question) was for a different item
> > altogether.  Sorry for any confusion.  The reason that I CC:d you ...
> >
> > From Gandhi:
> > >
> > > Interestingly even I was able to generate [Sean's coref output]
> > > using  piper GUI by  having only that single line - " The patient
> > > started study treatment of Thalomid 200mg (days 1-21), and
> > > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> > > treatment of hepatocellular carcinoma. " in the input file.
> > > But when I change the input file content with the following
> > > lines:   [Full paragraph (below), single-sentence in middle]  The
> > > co-reference superscript is lost by then.
> > Sean's answer:
> > >
> > > Ctakes is a system with many moving parts.  Things that precede or
> > > follow your original example sentence will change the evaluation of
> > > that sentence.
> > With the pipeline you are using and the full note, you should see a
> > number (mine is 4) next to the first "thalomid" in the original
> > example sentence.  If you click that number you should see (to the
> > right) 4 instances of "thalomid".
> > >
> > > Tim can correct me here, but maybe the coreference module ranked
> > > the links between "thalomid" as much higher than the rank between
> > > "study treatment of thalomid 200mg" and "the treatment of
> > > hepatocellular carcinoma" and discarded the encapsulating treatment
> > > texts from markables?  It is probably more complex than that.
> > Sean
> >
> > "This patient is participating in a Non-IND study; Protocol CG-
> > 000424: "Phase I/II of Thalidomide and Epirubicin in Patients with
> > Unresectable or Metastatic Hepatocellular Carcinoma".Information has
> > been received from the investigator regarding an 82 year-old male
> > patient who had gastrointestinal bleeding while on Thalomid,
> > Epirubicin, and Coumadin. He had a past medical history of
> > diverticulosis in 03/02 and a right atrial clot from intraventricular
> > catheter (IVC) for which he was started on Coumadin. During the
> > hospitalization for a right atrial clot in 03/02 hepatocellular
> > carcinoma was first noted and he was referred to an oncologist.  The
> > 

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Miller, Timothy
Here's the most recent publication, which describes the system in
ctakes 4.0 and later:
http://www.sciencedirect.com/science/article/pii/S1532046417300850
Tim

On Tue, 2017-10-03 at 13:52 +, Finan, Sean wrote:
> > 
> > With the changes in Input, the co-reference between all the
> > entities should still be preserved right?
> No.  One of the experts can better explain this, but the coreference
> module works with "best match" chains.  With one sentence of text,
> term (Markable) A may have a best match with term B.  As soon as you
> add more text, you introduce the possibility that term A will have a
> better best match with C and/or D, and the previous match to B will
> be deemed less accurate and dropped.  
> In your case the coreference A - B seems to be lost in favor of one
> using internal term A', and that is a little strange.  It could be
> that overlapping markables are being discarded?  I will try to look
> into this really quickly.
> 
> You can look at some publications on coref if you search the
> web.  The one that probably best applies to the current coref module
> (Tim, Dima, is this true?) is
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.aclweb.org_a
> nthology_W12-
> 2D2409=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ceLOeKc31GMcMXRVqM_QfDAoSqTWnl
> HbNcMy1vdWWTE=_CKDY58PHb_DWnHgx72vKozAAas7qI9k72hwfHU8Cik= 
> 
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Tuesday, October 03, 2017 4:18 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> Hi Sean, I still have some doubts on this. If I run the piper file
> with the complete text I sent earlier, I could see only superscript -
> 4 for Thalomid and the co-reference of this to  "treatment of
> hepatocellular carcinoma" is still lost. Also I don’t see any
> superscript with number-1 too. With the changes in Input, the co-
> reference between all the entities should still be preserved right?
> Do we have any more info or doc on this co-reference module to
> understand its complexity better?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, October 02, 2017 8:36 PM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> Hi Tim,
> 
> The coreference question (just a question) was for a different item
> altogether.  Sorry for any confusion.  The reason that I CC:d you ...
> 
> From Gandhi:
> > 
> > Interestingly even I was able to generate [Sean's coref output]
> > using  piper GUI by  having only that single line - " The patient
> > started study treatment of Thalomid 200mg (days 1-21), and
> > Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> > treatment of hepatocellular carcinoma. " in the input file.
> > But when I change the input file content with the following
> > lines:   [Full paragraph (below), single-sentence in middle]  The
> > co-reference superscript is lost by then.
> Sean's answer:
> > 
> > Ctakes is a system with many moving parts.  Things that precede or
> > follow your original example sentence will change the evaluation of
> > that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> > 
> > Tim can correct me here, but maybe the coreference module ranked
> > the links between "thalomid" as much higher than the rank between
> > "study treatment of thalomid 200mg" and "the treatment of
> > hepatocellular carcinoma" and discarded the encapsulating treatment
> > texts from markables?  It is probably more complex than that.
> Sean
> 
> "This patient is participating in a Non-IND study; Protocol CG-
> 000424: "Phase I/II of Thalidomide and Epirubicin in Patients with
> Unresectable or Metastatic Hepatocellular Carcinoma".Information has
> been received from the investigator regarding an 82 year-old male
> patient who had gastrointestinal bleeding while on Thalomid,
> Epirubicin, and Coumadin. He had a past medical history of
> diverticulosis in 03/02 and a right atrial clot from intraventricular
> catheter (IVC) for which he was started on Coumadin. During the
> hospitalization for a right atrial clot in 03/02 hepatocellular
> carcinoma was first noted and he was referred to an oncologist.  The
> patient started study treatment of Thalomid 200mg (days 1-21), and
> Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> treatment of hepatocellular carcinoma.  He was concomitantly
> receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and
> Oxycodone. This patient presented to the emergency room with 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
Hi Gandhi,

Thank you for asking.  There is no action item for you concerning the 
coreference output that you see.   However, if you would like to help the 
community understand how the module works (input and output), maybe you could 
do something like run the pipeline on your original sentence, then that 
sentence plus another (before), then that sentence plus another (after) ... and 
see how the output changes with the input.  If you take screenshots or 
something then we could put them on the wiki.  Also, would you mind if the 
paragraph you sent became one of the example notes in ctakes?  That means that 
it would be redistributed with the code.

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:26 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Tim/Sean,



Is this an action item on us? If yes, Could someone give us some valid inputs 
to test the same? Is someone else going to review this again?



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Monday, October 02, 2017 8:06 PM

To: dev@ctakes.apache.org

Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]



My bad, I didn't read too closely and thought this was going to be a 
coreference patch. I don't know this FSM code that well, so I am not an expert. 
My biggest concern at a glance is that these additions help find more true 
positives (as in your examples), can we verify that they won't create false 
positives?

Tim





On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:

> Hi Sean,

>

> Thanks again for the response. I guess its mistake from my side that I

> dint send the complete text. Did you mean that with the text I sent,

> the co-reference superscript-1 will be lost?

>

> Also as per your advice, We have created an issue  - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=sGlpzaOnKKPgjhHkkpfELXpFFGvJtj1Ib-9t3JrGbpQ=STDKsvR9fK6KZuwRjRT3q1gZI8T7ptaKlVWVumKi5dc=
>  

> se.proofpoint.com/v2/url?u=https-

> 3A__issues.apache.org_jira_browse_CTAKES-

> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-

> IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for

> measurement FSM changes and attached the modified file changes. Could

> someone have a look and know your thoughts please?

>

> Regards,

> Gandhi

>

>

> -Original Message-

> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

> Sent: Thursday, September 28, 2017 8:21 PM

> To: dev@ctakes.apache.org

> Cc: Miller, Timothy 

> Subject: RE: Enabling drugner pipeline and identifying dates

> [EXTERNAL] [SUSPICIOUS]

>

> Hi Gandhi,

>

> I don't recall you sending me that entire snippet of text.  I think

> that I only had your single example sentence.

> You have discovered one of the quirks of software: "change the data,

> change the result."

> Ctakes is a system with many moving parts.  Things that precede or

> follow your original example sentence will change the evaluation of

> that sentence.

> With the pipeline you are using and the full note, you should see a

> number (mine is 4) next to the first "thalomid" in the original

> example sentence.  If you click that number you should see (to the

> right) 4 instances of "thalomid".

> Tim can correct me here, but maybe the coreference module ranked the

> links between "thalomid" as much higher than the rank between "study

> treatment of thalomid 200mg" and "the treatment of hepatocellular

> carcinoma" and discarded the encapsulating treatment texts from

> markables?  It is probably more complex than that.

>

> >

> > we have also made some code changes in MeasurementFSM.java to

> > identify certain measurements like '20 mg/m2' which was not

> > identified out of the box.  Should we send the code changes to you

> > so that you can consider the same to be productized ? Please

> > advise."

> I don't know if you've noticed the recent emails on the dev list

> involving Alexandru Zbarcea.  Alex has been creating or commenting on

> Jira items and attaching code for  fixes and enhancements.  This is a

> widely used process and is fairly easy to follow.   I think that the

> following links are relevant:

> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http

> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-

> 2Dissues-

> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe

> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-

> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g

> 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Finan, Sean
> With the changes in Input, the co-reference between all the entities should 
> still be preserved right?
No.  One of the experts can better explain this, but the coreference module 
works with "best match" chains.  With one sentence of text, term (Markable) A 
may have a best match with term B.  As soon as you add more text, you introduce 
the possibility that term A will have a better best match with C and/or D, and 
the previous match to B will be deemed less accurate and dropped.  
In your case the coreference A - B seems to be lost in favor of one using 
internal term A', and that is a little strange.  It could be that overlapping 
markables are being discarded?  I will try to look into this really quickly.

You can look at some publications on coref if you search the web.  The one that 
probably best applies to the current coref module (Tim, Dima, is this true?) is
https://www.aclweb.org/anthology/W12-2409

Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Tuesday, October 03, 2017 4:18 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Sean, I still have some doubts on this. If I run the piper file with the 
complete text I sent earlier, I could see only superscript - 4 for Thalomid and 
the co-reference of this to  "treatment of hepatocellular carcinoma" is still 
lost. Also I don’t see any superscript with number-1 too. With the changes in 
Input, the co-reference between all the entities should still be preserved 
right? Do we have any more info or doc on this co-reference module to 
understand its complexity better?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, October 02, 2017 8:36 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Tim,

The coreference question (just a question) was for a different item altogether. 
 Sorry for any confusion.  The reason that I CC:d you ...

From Gandhi:
> Interestingly even I was able to generate [Sean's coref output] using  piper 
> GUI by  having only that single line - " The patient started study treatment 
> of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) 
> on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input 
> file.
>But when I change the input file content with the following lines:   [Full 
>paragraph (below), single-sentence in middle]  The co-reference superscript is 
>lost by then.

Sean's answer:
> Ctakes is a system with many moving parts.  Things that precede or follow 
> your original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
>Tim can correct me here, but maybe the coreference module ranked the links 
>between "thalomid" as much higher than the rank between "study treatment of 
>thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
>the encapsulating treatment texts from markables?  It is probably more complex 
>than that.

Sean

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for 

RE: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Finan, Sean
You can find all kinds of background information on the web with a search like 
"nlp tokenization".  You can look at 
org.apache.ctakes.gui.dictionary.util.TextTokenizer in the ctakes-gui module to 
see how the dictionary creator does it.  You can run .getTokenizedText( text ) 
to get a tokenized string or .getTokens( text ) to get a list of words.  
Apparently I was lazy and didn't write javadocs ...

Sean

-Original Message-
From: Jeff Headley [mailto:jeffun...@gmail.com] 
Sent: Tuesday, October 03, 2017 9:19 AM
To: dev@ctakes.apache.org
Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

Thanks Sean. Not quite, sorry for the confusion. We keep the default dictionary 
hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI tables and move 
over data from a sql server db. I don't seem to recall doing anything with a 
tcount column. I'll have to check our code tonight.
That could very well be it. So maybe the old ctakes had a bug and this should 
not have been working to begin with. Got anywhere I could read about the 
tokenizing rules and calculating the tcount value? Or maybe a java class I 
could look at?

Jeff


On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Ok, let me see if I understand your current setup:
>
> Ctakes 4.0 fast lookup,
> Dictionary configuration file points to an sql server, Sql server uses 
> cui_terms  (cui, rword, rindex, tcount, text) and perhaps other 
> secondary tables ...
>
> Now that I write out the column names, I have a thought.  Is it 
> possible that for some term the number in tcount does not match the 
> number of non-whitespace 'words' in the text column?  If those numbers 
> are off then you will have problems similar to the one that you are seeing.
> If you are populating your own table you need to make sure that the 
> text is being properly tokenized.  For instance, the term "alpha-beta" 
> should have text "alpha - beta" with tcount 3.  There are some 
> exceptions to the dash -separation rule and a few oddities.
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Tuesday, October 03, 2017 8:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> I updated our pom to use the same hsqldb version as what I saw in the 
> ctakes lib folder. The data coming in is from a SQL Server database.
>
> On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > I don't think that a custom dictionary should cause a null pointer 
> > exception on that line unless you have an odd null character in text 
> > or something of that ilk.
> >
> > One thing that changed in ctakes 4.0 is the version of hsqldb that 
> > is being used for the dictionary database.  I don’t know if that has 
> > anything to do with your problem, but it may be causing others.
> > What is the source of your custom dictionary?  There may be a better 
> > way to populate a database.
> >
> > Sean
> >
> > -Original Message-
> > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > Sent: Tuesday, October 03, 2017 12:53 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator 
> > [EXTERNAL]
> >
> > Thank you Sean. That helped to figure out what we did. Not quite 
> > sure where we went wrong but now at least we know the cause. So a 
> > long time ago in our project using ctakes, we emptied out the tables 
> > CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the 
> > values we wanted. Worked great. Now in the new version the
> > /desc/ctakes-clinical-
> > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xm
> > l
> > engine seems to be
> > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > 16ab/sno_rx_16ab
> > and that seems to be where things went sideways. If I don't mess 
> > with the db and keep the original, no errors.
> >
> > So somewhere in this if statement at line 102 in
> DefaultJCASTermAnnotator:
> > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
> >   || hitTokens[ hit ].equals( allTokens.get( i
> > ).getVariant() )
> > ) {
> >
> > It's expecting to not ever have a null and I suspect we are leaving 
> > something null somewhere that really shouldn't have nulls. If it's 
> > obvioius where I've went wrong, the assistance would be appreciated.
> > Otherwise, I'll get it figured out eventually. I suspect it's 
> > possibly because we never did anything with the SNOMEDCT_US in the prior 
> > version.
> >
> > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < 
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > I have no problem running on your example "DIDANOSINE, 250MG (PO 
> > > Capsule Delayed Release)" or any other text.
> > >
> > > I don't know how you  are running ctakes through
> > com.clientproject.ctakes.
> > > 

Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Jeff Headley
Thanks Sean. Not quite, sorry for the confusion. We keep the default
dictionary hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI
tables and move over data from a sql server db. I don't seem to recall
doing anything with a tcount column. I'll have to check our code tonight.
That could very well be it. So maybe the old ctakes had a bug and this
should not have been working to begin with. Got anywhere I could read about
the tokenizing rules and calculating the tcount value? Or maybe a java
class I could look at?

Jeff


On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Ok, let me see if I understand your current setup:
>
> Ctakes 4.0 fast lookup,
> Dictionary configuration file points to an sql server,
> Sql server uses cui_terms  (cui, rword, rindex, tcount, text) and perhaps
> other secondary tables
> ...
>
> Now that I write out the column names, I have a thought.  Is it possible
> that for some term the number in tcount does not match the number of
> non-whitespace 'words' in the text column?  If those numbers are off then
> you will have problems similar to the one that you are seeing.
> If you are populating your own table you need to make sure that the text
> is being properly tokenized.  For instance, the term "alpha-beta" should
> have text "alpha - beta" with tcount 3.  There are some exceptions to the
> dash -separation rule and a few oddities.
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Tuesday, October 03, 2017 8:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> I updated our pom to use the same hsqldb version as what I saw in the
> ctakes lib folder. The data coming in is from a SQL Server database.
>
> On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > I don't think that a custom dictionary should cause a null pointer
> > exception on that line unless you have an odd null character in text
> > or something of that ilk.
> >
> > One thing that changed in ctakes 4.0 is the version of hsqldb that is
> > being used for the dictionary database.  I don’t know if that has
> > anything to do with your problem, but it may be causing others.
> > What is the source of your custom dictionary?  There may be a better
> > way to populate a database.
> >
> > Sean
> >
> > -Original Message-
> > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > Sent: Tuesday, October 03, 2017 12:53 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> >
> > Thank you Sean. That helped to figure out what we did. Not quite sure
> > where we went wrong but now at least we know the cause. So a long time
> > ago in our project using ctakes, we emptied out the tables CUI_TERMS,
> > RXNORM, PREFTERM, and TUI and then loaded them with the values we
> > wanted. Worked great. Now in the new version the
> > /desc/ctakes-clinical-
> > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
> > engine seems to be
> > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > 16ab/sno_rx_16ab
> > and that seems to be where things went sideways. If I don't mess with
> > the db and keep the original, no errors.
> >
> > So somewhere in this if statement at line 102 in
> DefaultJCASTermAnnotator:
> > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
> >   || hitTokens[ hit ].equals( allTokens.get( i
> > ).getVariant() )
> > ) {
> >
> > It's expecting to not ever have a null and I suspect we are leaving
> > something null somewhere that really shouldn't have nulls. If it's
> > obvioius where I've went wrong, the assistance would be appreciated.
> > Otherwise, I'll get it figured out eventually. I suspect it's possibly
> > because we never did anything with the SNOMEDCT_US in the prior version.
> >
> > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > I have no problem running on your example "DIDANOSINE, 250MG (PO
> > > Capsule Delayed Release)" or any other text.
> > >
> > > I don't know how you  are running ctakes through
> > com.clientproject.ctakes.
> > > processors.CommandLineProcessor, so I don't know how closely the
> > > standard pipeline approximates yours.
> > >
> > > Sean
> > >
> > > -Original Message-
> > > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > > Sent: Sunday, October 01, 2017 11:31 PM
> > > To: dev@ctakes.apache.org
> > > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> > >
> > > After upgrading our project to version 4, we are getting a NPE from
> > cTAKES.
> > > The text that was being processed was DIDANOSINE, 250MG (PO Capsule
> > > Delayed Release), though it seems to be happening to us no matter
> > > what text we submit.  The stack trace is below. Any help would be
> > > 

RE: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Finan, Sean
Ok, let me see if I understand your current setup:

Ctakes 4.0 fast lookup,
Dictionary configuration file points to an sql server,
Sql server uses cui_terms  (cui, rword, rindex, tcount, text) and perhaps other 
secondary tables
...

Now that I write out the column names, I have a thought.  Is it possible that 
for some term the number in tcount does not match the number of non-whitespace 
'words' in the text column?  If those numbers are off then you will have 
problems similar to the one that you are seeing.
If you are populating your own table you need to make sure that the text is 
being properly tokenized.  For instance, the term "alpha-beta" should have text 
"alpha - beta" with tcount 3.  There are some exceptions to the dash 
-separation rule and a few oddities.

Sean

-Original Message-
From: Jeff Headley [mailto:jeffun...@gmail.com] 
Sent: Tuesday, October 03, 2017 8:52 AM
To: dev@ctakes.apache.org
Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

I updated our pom to use the same hsqldb version as what I saw in the ctakes 
lib folder. The data coming in is from a SQL Server database.

On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Jeff,
>
> I don't think that a custom dictionary should cause a null pointer 
> exception on that line unless you have an odd null character in text 
> or something of that ilk.
>
> One thing that changed in ctakes 4.0 is the version of hsqldb that is 
> being used for the dictionary database.  I don’t know if that has 
> anything to do with your problem, but it may be causing others.
> What is the source of your custom dictionary?  There may be a better 
> way to populate a database.
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Tuesday, October 03, 2017 12:53 AM
> To: dev@ctakes.apache.org
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> Thank you Sean. That helped to figure out what we did. Not quite sure 
> where we went wrong but now at least we know the cause. So a long time 
> ago in our project using ctakes, we emptied out the tables CUI_TERMS, 
> RXNORM, PREFTERM, and TUI and then loaded them with the values we 
> wanted. Worked great. Now in the new version the 
> /desc/ctakes-clinical- 
> pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
> engine seems to be
> using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> 16ab/sno_rx_16ab
> and that seems to be where things went sideways. If I don't mess with 
> the db and keep the original, no errors.
>
> So somewhere in this if statement at line 102 in DefaultJCASTermAnnotator:
> if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
>   || hitTokens[ hit ].equals( allTokens.get( i 
> ).getVariant() )
> ) {
>
> It's expecting to not ever have a null and I suspect we are leaving 
> something null somewhere that really shouldn't have nulls. If it's 
> obvioius where I've went wrong, the assistance would be appreciated. 
> Otherwise, I'll get it figured out eventually. I suspect it's possibly 
> because we never did anything with the SNOMEDCT_US in the prior version.
>
> On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > I have no problem running on your example "DIDANOSINE, 250MG (PO 
> > Capsule Delayed Release)" or any other text.
> >
> > I don't know how you  are running ctakes through
> com.clientproject.ctakes.
> > processors.CommandLineProcessor, so I don't know how closely the 
> > standard pipeline approximates yours.
> >
> > Sean
> >
> > -Original Message-
> > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > Sent: Sunday, October 01, 2017 11:31 PM
> > To: dev@ctakes.apache.org
> > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> >
> > After upgrading our project to version 4, we are getting a NPE from
> cTAKES.
> > The text that was being processed was DIDANOSINE, 250MG (PO Capsule 
> > Delayed Release), though it seems to be happening to us no matter 
> > what text we submit.  The stack trace is below. Any help would be 
> > appreciated as I'm at a loss at to what we might be doing wrong if 
> > this
> is not a bug in cTAKES.
> >
> > Thank you,
> > Jeff
> >
> > Oct 01, 2017 11:10:16 PM
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
> > processAndOutputNewCASes(273)
> > SEVERE: Exception occurred
> > org.apache.uima.analysis_engine.AnalysisEngineProcessException:
> > Annotator processing failed.
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.
> > 

Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Jeff Headley
I updated our pom to use the same hsqldb version as what I saw in the
ctakes lib folder. The data coming in is from a SQL Server database.

On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> I don't think that a custom dictionary should cause a null pointer
> exception on that line unless you have an odd null character in text or
> something of that ilk.
>
> One thing that changed in ctakes 4.0 is the version of hsqldb that is
> being used for the dictionary database.  I don’t know if that has anything
> to do with your problem, but it may be causing others.
> What is the source of your custom dictionary?  There may be a better way
> to populate a database.
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Tuesday, October 03, 2017 12:53 AM
> To: dev@ctakes.apache.org
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> Thank you Sean. That helped to figure out what we did. Not quite sure
> where we went wrong but now at least we know the cause. So a long time ago
> in our project using ctakes, we emptied out the tables CUI_TERMS, RXNORM,
> PREFTERM, and TUI and then loaded them with the values we wanted. Worked
> great. Now in the new version the /desc/ctakes-clinical-
> pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
> engine seems to be
> using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> 16ab/sno_rx_16ab
> and that seems to be where things went sideways. If I don't mess with the
> db and keep the original, no errors.
>
> So somewhere in this if statement at line 102 in DefaultJCASTermAnnotator:
> if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
>   || hitTokens[ hit ].equals( allTokens.get( i ).getVariant() )
> ) {
>
> It's expecting to not ever have a null and I suspect we are leaving
> something null somewhere that really shouldn't have nulls. If it's obvioius
> where I've went wrong, the assistance would be appreciated. Otherwise, I'll
> get it figured out eventually. I suspect it's possibly because we never did
> anything with the SNOMEDCT_US in the prior version.
>
> On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > I have no problem running on your example "DIDANOSINE, 250MG (PO
> > Capsule Delayed Release)" or any other text.
> >
> > I don't know how you  are running ctakes through
> com.clientproject.ctakes.
> > processors.CommandLineProcessor, so I don't know how closely the
> > standard pipeline approximates yours.
> >
> > Sean
> >
> > -Original Message-
> > From: Jeff Headley [mailto:jeffun...@gmail.com]
> > Sent: Sunday, October 01, 2017 11:31 PM
> > To: dev@ctakes.apache.org
> > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> >
> > After upgrading our project to version 4, we are getting a NPE from
> cTAKES.
> > The text that was being processed was DIDANOSINE, 250MG (PO Capsule
> > Delayed Release), though it seems to be happening to us no matter what
> > text we submit.  The stack trace is below. Any help would be
> > appreciated as I'm at a loss at to what we might be doing wrong if this
> is not a bug in cTAKES.
> >
> > Thank you,
> > Jeff
> >
> > Oct 01, 2017 11:10:16 PM
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
> > processAndOutputNewCASes(273)
> > SEVERE: Exception occurred
> > org.apache.uima.analysis_engine.AnalysisEngineProcessException:
> > Annotator processing failed.
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.
> > processUntilNextOutputCas(ASB_impl.java:570)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> > AggregateCasIterator.(ASB_impl.java:412)
> > at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.
> > process(ASB_impl.java:344)
> > at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.
> > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
> > at
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(
> > AnalysisEngineImplBase.java:269)
> > at
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(
> > AnalysisEngineImplBase.java:284)
> > at
> > com.clientproject.ctakes.processors.CommandLineProcessor.processLine(
> > CommandLineProcessor.java:163)
> > at
> > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.
> > java:1374)
> > at
> > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.
> > java:580)
> > at
> > com.clientproject.ctakes.processors.CommandLineProcessor.run(
> > CommandLineProcessor.java:114)
> > at 

RE: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

2017-10-03 Thread Finan, Sean
Hi Jeff,

I don't think that a custom dictionary should cause a null pointer exception on 
that line unless you have an odd null character in text or something of that 
ilk.

One thing that changed in ctakes 4.0 is the version of hsqldb that is being 
used for the dictionary database.  I don’t know if that has anything to do with 
your problem, but it may be causing others.
What is the source of your custom dictionary?  There may be a better way to 
populate a database.

Sean

-Original Message-
From: Jeff Headley [mailto:jeffun...@gmail.com] 
Sent: Tuesday, October 03, 2017 12:53 AM
To: dev@ctakes.apache.org
Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

Thank you Sean. That helped to figure out what we did. Not quite sure where we 
went wrong but now at least we know the cause. So a long time ago in our 
project using ctakes, we emptied out the tables CUI_TERMS, RXNORM, PREFTERM, 
and TUI and then loaded them with the values we wanted. Worked great. Now in 
the new version the 
/desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
engine seems to be
using 
/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab
and that seems to be where things went sideways. If I don't mess with the db 
and keep the original, no errors.

So somewhere in this if statement at line 102 in DefaultJCASTermAnnotator:
if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
  || hitTokens[ hit ].equals( allTokens.get( i ).getVariant() )
) {

It's expecting to not ever have a null and I suspect we are leaving something 
null somewhere that really shouldn't have nulls. If it's obvioius where I've 
went wrong, the assistance would be appreciated. Otherwise, I'll get it figured 
out eventually. I suspect it's possibly because we never did anything with the 
SNOMEDCT_US in the prior version.

On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> I have no problem running on your example "DIDANOSINE, 250MG (PO 
> Capsule Delayed Release)" or any other text.
>
> I don't know how you  are running ctakes through com.clientproject.ctakes.
> processors.CommandLineProcessor, so I don't know how closely the 
> standard pipeline approximates yours.
>
> Sean
>
> -Original Message-
> From: Jeff Headley [mailto:jeffun...@gmail.com]
> Sent: Sunday, October 01, 2017 11:31 PM
> To: dev@ctakes.apache.org
> Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> After upgrading our project to version 4, we are getting a NPE from cTAKES.
> The text that was being processed was DIDANOSINE, 250MG (PO Capsule 
> Delayed Release), though it seems to be happening to us no matter what 
> text we submit.  The stack trace is below. Any help would be 
> appreciated as I'm at a loss at to what we might be doing wrong if this is 
> not a bug in cTAKES.
>
> Thank you,
> Jeff
>
> Oct 01, 2017 11:10:16 PM
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
> processAndOutputNewCASes(273)
> SEVERE: Exception occurred
> org.apache.uima.analysis_engine.AnalysisEngineProcessException: 
> Annotator processing failed.
> at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412)
> at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.
> processUntilNextOutputCas(ASB_impl.java:570)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> AggregateCasIterator.(ASB_impl.java:412)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.
> process(ASB_impl.java:344)
> at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.
> processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(
> AnalysisEngineImplBase.java:269)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(
> AnalysisEngineImplBase.java:284)
> at
> com.clientproject.ctakes.processors.CommandLineProcessor.processLine(
> CommandLineProcessor.java:163)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.
> java:1374)
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.
> java:580)
> at
> com.clientproject.ctakes.processors.CommandLineProcessor.run(
> CommandLineProcessor.java:114)
> at com.clientproject.ctakes.App.main(App.java:109)
> Caused by: java.lang.NullPointerException at 
> org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> isTermMatch(DefaultJCasTermAnnotator.java:102)
> at
> org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> findTerms(DefaultJCasTermAnnotator.java:79)
> at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Gandhi Rajan Natarajan
Hi Sean, I still have some doubts on this. If I run the piper file with the 
complete text I sent earlier, I could see only superscript - 4 for Thalomid and 
the co-reference of this to  "treatment of hepatocellular carcinoma" is still 
lost. Also I don’t see any superscript with number-1 too. With the changes in 
Input, the co-reference between all the entities should still be preserved 
right? Do we have any more info or doc on this co-reference module to 
understand its complexity better?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Monday, October 02, 2017 8:36 PM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

Hi Tim,

The coreference question (just a question) was for a different item altogether. 
 Sorry for any confusion.  The reason that I CC:d you ...

From Gandhi:
> Interestingly even I was able to generate [Sean's coref output] using  piper 
> GUI by  having only that single line - " The patient started study treatment 
> of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) 
> on 06/07/02 for the treatment of hepatocellular carcinoma. " in the input 
> file.
>But when I change the input file content with the following lines:   [Full 
>paragraph (below), single-sentence in middle]  The co-reference superscript is 
>lost by then.

Sean's answer:
> Ctakes is a system with many moving parts.  Things that precede or follow 
> your original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
>Tim can correct me here, but maybe the coreference module ranked the links 
>between "thalomid" as much higher than the rank between "study treatment of 
>thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
>the encapsulating treatment texts from markables?  It is probably more complex 
>than that.

Sean

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done that showed no evidence of acute pneumonia or congestive heart failure.  
On 07/14/02, he underwent  an ultrasound which was negative for deep vein 
thrombosis. This patient did not take Thalomid on the day of his admittance to 
the hospital, but resumed treatment shortly after with no return of symptoms. 
On 07/15/02, he was discharged in stable condition. There have been no further 
reports of bleeding at this time. Thedoctor has assessed the hematochezia as 
related to Coumadin treatment and previously diagnosed diverticulosis, and not 
to protocol therapy with Thalomid and Epirubicin.Additional information 
received from the investigator on 27Aug02 reveals that this male patient began 
on 07Jun02 two cycles of therapy with Thalidomide and Epirubicin.  His post 
cycle two computed tomography scans revealed increase in size of liver lesion 
with development of multiple new satellite nodules.  On 29Jul02, the 
investigator removed this patient from protocol for progressive disease and 
recommended hospice care.  After seeking a second opinion from two other 
institutions, this patient was admitted to hospice on 05Aug02.  On 20Aug02, the 
investigator noted that this patient was suffering worsening fatigue and got 
tired getting out of his chair.  On 25Aug02, this patient died due to disease 
progression.  The