Re: Continuous Integration for cTAKES

2017-09-29 Thread Alexandru Zbarcea
Hi,

I have created the following jobs:

   - cTAKES-trunk-Java-1.8
   

   - cTAKES-trunk-Java-1.9
   



As you may see, both of them fail, and this is the proper behavior.

The previous job (i.e. ctakes-trunk-compiletest
)
used
the default Jenkins behavior, which uses: -Dmaven.test.failure.ignore=true.
As a consequence, each build was reported as stable. Another consequence
was the improper identification of tests (34 tests instead of 70 tests). I
left the old jobs untouched, and the community should choose which jobs
prefers to use and how it would like to proceed forward.

Are there any features that the community was looking for Jenkins
integration (e.g. Jenkinsfile, javadocs, support for 3.x and 4.x etc)?

Alex

On Thu, Sep 28, 2017 at 7:43 PM, Alexandru Zbarcea  wrote:

> Thank you James for your feedback,
>
> Alex
>
> On Thu, Sep 28, 2017 at 7:05 PM, James Masanz 
> wrote:
>
>> Thanks Alex, that's great!
>>
>> -- James
>>
>> On Thu, Sep 28, 2017 at 5:40 PM, Alexandru Zbarcea 
>> wrote:
>>
>> > Hi,
>> >
>> > I have created a new Jira Issue[1] regarding CI/CD where I started to
>> work
>> > on improvements of the current pipeline.
>> >
>> > I have helped Apache Infra in the past and I have enough karma to help
>> this
>> > project with the right automation.
>> >
>> > I look forward to your feedback and advice, and maybe we can collect in
>> > CTAKES-458 [1] those things that would improve cTAKES build and release
>> > process.
>> >
>> > Alex
>> >
>> > [1] - https://issues.apache.org/jira/browse/CTAKES-458
>> >
>>
>
>


Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-09-29 Thread Miller, Timothy
It is a very busy time for me but this is on my todo list. Don't be
afraid to ping in a week or so if you don't hear anything.

Tim

On Fri, 2017-09-29 at 14:04 +, Finan, Sean wrote:
> Hi Gandhi,
> > 
> > Did you mean that with the text I sent, the co-reference
> > superscript-1 will be lost?
> Yes.  Well, to be more clear, the coreference that was resolved as #1
> in your original sentence alone will be lost.  However, there are
> eight or none coreference chains discovered in your full paragraph,
> and one of those will have superscript 1s.
> 
> > 
> > Could someone have a look and know your thoughts please?
> Thank you for creating the jira and the patch.  I am sure that
> somebody will take a look.
> 
> Thanks,
> Sean
> 
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Friday, September 29, 2017 2:25 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67Gv
> lGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVho
> hpozRybEEpwNUg=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy 
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp
> 2Q= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1G
> yc= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=WT5NtwXSeAbZOb6iAojfglU5OKMnCTmyyo1HUUggC
> rE= 
> 
> I don't know if you have a jira account and permissions for the
> ctakes project.  An administrator may need to set that up for you.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> 

RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-29 Thread Finan, Sean
Hi Gandhi,
> Did you mean that with the text I sent, the co-reference superscript-1 will 
> be lost?
Yes.  Well, to be more clear, the coreference that was resolved as #1 in your 
original sentence alone will be lost.  However, there are eight or none 
coreference chains discovered in your full paragraph, and one of those will 
have superscript 1s.

> Could someone have a look and know your thoughts please?
Thank you for creating the jira and the patch.  I am sure that somebody will 
take a look.

Thanks,
Sean


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: Friday, September 29, 2017 2:25 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks again for the response. I guess its mistake from my side that I dint 
send the complete text. Did you mean that with the text I sent, the 
co-reference superscript-1 will be lost?

Also as per your advice, We have created an issue  - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q=
   for measurement FSM changes and attached the modified file changes. Could 
someone have a look and know your thoughts please?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 28, 2017 8:21 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy 
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-2Dissues-2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp2Q=
 
Creating patches:   
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-2Dpre-2Dcommit-2Dreviews-2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1Gyc=
 
Attaching files:   
https://urldefense.proofpoint.com/v2/url?u=https-3A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-2Dscreenshots-2Dto-2Dissues-2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVhohpozRybEEpwNUg=WT5NtwXSeAbZOb6iAojfglU5OKMnCTmyyo1HUUggCrE=
 

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Thursday, September 28, 2017 4:09 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 
sample HTML using  piper GUI by  having only that single line - " The patient 
started study 

RE: Clarity on ICD10 Dictionary and Concept Matching [EXTERNAL]

2017-09-29 Thread Finan, Sean
Hi Matthew,

It is possible to both
1. Edit the primary hsql dictionary, and
2. Add secondary dictionaries to your ctakes pipeline.

You can find information on the web related to editing an hsql database.  There 
is some random info on the web regarding the creation and use of bsv files for 
dictionaries, like here 
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201510.mbox/%3cf8bb8fbaf4d64d04814732dd3f846...@chexmail1a.chboston.org%3E

I am in the process of putting something on the wiki describing the best way to 
create and use a simple bsv dictionary.  What I really need to do is snag a 
couple of days and make a series of videos.

Sean

-Original Message-
From: Matthew Vita [mailto:matthewvit...@gmail.com] 
Sent: Friday, September 29, 2017 1:47 AM
To: dev@ctakes.apache.org
Subject: Clarity on ICD10 Dictionary and Concept Matching [EXTERNAL]

Hi Sean,

If you recall earlier this month, I noted that I had to type the full "type
2 diabetes mellitus" to get a match with my ICD10 dictionary:
https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201709.mbox_-253CCAOV-5F6RL8rpg0AirHx4w-2BUAc6d-2BOABukfgsOz5ZvaoRF2D8V0hg-40mail.gmail.com-253E=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=8cccokAs_n3WUvM7HdWmV90l8PTJ07yDpKJJaXCYG_0=Hwppz_WKUTOKPmNiYGJyBOZdKLfpe4_iK_zBWBdXm00=
(I made the dictionary myself and posted my solution to YouTube:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3D4aOnafv-2DNQs=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=8cccokAs_n3WUvM7HdWmV90l8PTJ07yDpKJJaXCYG_0=ImCKEUl2XqxuodslayVwpP44_bQ6yRXDO0CNcAC6HWA=
 )

I appreciate the response you gave, pointing me to the "type 2 diabetes"
direct matches in MEDLINEPLUS, MSH, NCI, CHV, etc.

Those make sense to me, but I worry that users are typically used to just
ICD10 and won't understand the other coding schemas. Can you provide details on 
improving the ICD10 dictionary concept matches? In the worst case scenario, can 
I update the dictionary by hand to improve certain common concept matches?

Thanks,

Matthew Vita
www.matthewvita.com


RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-09-29 Thread Gandhi Rajan Natarajan
Hi Sean,

Thanks again for the response. I guess its mistake from my side that I dint 
send the complete text. Did you mean that with the text I sent, the 
co-reference superscript-1 will be lost?

Also as per your advice, We have created an issue  - 
https://issues.apache.org/jira/browse/CTAKES-459  for measurement FSM changes 
and attached the modified file changes. Could someone have a look and know your 
thoughts please?

Regards,
Gandhi


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 28, 2017 8:21 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy 
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Gandhi,

I don't recall you sending me that entire snippet of text.  I think that I only 
had your single example sentence.
You have discovered one of the quirks of software: "change the data, change the 
result."
Ctakes is a system with many moving parts.  Things that precede or follow your 
original example sentence will change the evaluation of that sentence.
With the pipeline you are using and the full note, you should see a number 
(mine is 4) next to the first "thalomid" in the original example sentence.  If 
you click that number you should see (to the right) 4 instances of "thalomid".
Tim can correct me here, but maybe the coreference module ranked the links 
between "thalomid" as much higher than the rank between "study treatment of 
thalomid 200mg" and "the treatment of hepatocellular carcinoma" and discarded 
the encapsulating treatment texts from markables?  It is probably more complex 
than that.

> we have also made some code changes in MeasurementFSM.java to identify 
> certain measurements like '20 mg/m2' which was not identified out of the box. 
>  Should we send the code changes to you so that you can consider the same to 
> be productized ? Please advise."

I don't know if you've noticed the recent emails on the dev list involving 
Alexandru Zbarcea.  Alex has been creating or commenting on Jira items and 
attaching code for  fixes and enhancements.  This is a widely used process and 
is fairly easy to follow.   I think that the following links are relevant:
Working with issues:  
https://confluence.atlassian.com/jiracoreserver073/working-with-issues-861257307.html
Creating patches:   
https://confluence.atlassian.com/crucible/creating-patch-files-for-pre-commit-reviews-298977458.html
Attaching files:   
https://confluence.atlassian.com/jiracorecloud/attaching-files-and-screenshots-to-issues-765593805.html

I don't know if you have a jira account and permissions for the ctakes project. 
 An administrator may need to set that up for you.

Thanks,
Sean

-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Thursday, September 28, 2017 4:09 AM
To: dev@ctakes.apache.org
Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] 
[SUSPICIOUS]

Hi Sean,

Thanks for the response. I was able to see the co-reference superscript using 
the html file that you sent. Interestingly even I was able to generate the 
sample HTML using  piper GUI by  having only that single line - " The patient 
started study treatment of Thalomid 200mg (days 1-21), and Epirubicin, 20 mg/m2 
(days 1, 8, and 15) on 06/07/02 for the treatment of hepatocellular carcinoma. 
" in the input file.

But when I change the input file content with the following lines:

"This patient is participating in a Non-IND study; Protocol CG-000424: "Phase 
I/II of Thalidomide and Epirubicin in Patients with Unresectable or Metastatic 
Hepatocellular Carcinoma".Information has been received from the investigator 
regarding an 82 year-old male patient who had gastrointestinal bleeding while 
on Thalomid, Epirubicin, and Coumadin. He had a past medical history of 
diverticulosis in 03/02 and a right atrial clot from intraventricular catheter 
(IVC) for which he was started on Coumadin. During the hospitalization for a 
right atrial clot in 03/02 hepatocellular carcinoma was first noted and he was 
referred to an oncologist.  The patient started study treatment of Thalomid 
200mg (days 1-21), and Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for 
the treatment of hepatocellular carcinoma.  He was concomitantly receiving 
Cardura, Ambien (for insomnia), Megace, Coumadin, and Oxycodone. This patient 
presented to the emergency room with the chief complaint of hematochezia. He 
reported noticing bright red blood and small clots mixed in with his stool. On 
07/13/02, he was admitted due to gastrointestinal bleed.  The physician ordered 
2 large bore intravenous lines and planned to transfuse for hematocrit less 
than 30%. Due to the  INR (international normalized ratio) level of 3.0, 
Coumadin was held. He was also noted to have bilateral lower extremity edema 
with dyspnea on exertion.  On 07/13/02, he had a chest X-ray PA and lateral 
done