Re: cTAKES 4.0rc3 - custom dictionary needs rebuilding?

2017-04-23 Thread James Masanz
I've added a section[1] to the User Install Guide describing how to convert
an HSQLDB from hsqldb 1.8 to 2.3.4 for cTAKES 4.0

[1]
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+User+Install+Guide#cTAKES4.0UserInstallGuide-ConvertDictionariesYou'vePreviouslyCreatedtobeCompatiblewithcTAKES4.0

On Sun, Apr 23, 2017 at 10:03 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Thank you James - great explanation and information.
>
> Hi David,
> There is a new version of the dictionary creator gui distributed with
> ctakes 4.0.  It is 99% the same as the version in sandbox.  One important
> difference is that it now produces a database in hsqldb 2.3.4 - compatible
> with ctakes 4.0.
> Another difference (important to you) is that there is an exclusion list
> in a data file that lists cui / term combinations that can be excluded.  By
> default the medications "toDAY" and "ToMORROW" are in that list.  It is
> obviously done to prevent the frequent false positives that you spoke of in
> your Amia presentation.  For non-vet use these exclusions are pretty
> valid.  Since you are interested in a vet dictionary fi you use the new
> dictionary creator you can decide whether or not you want them to be
> included.
>
> Sean
>
> -Original Message-
> From: James Masanz [mailto:masanz.ja...@gmail.com]
> Sent: Saturday, April 22, 2017 11:32 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES 4.0rc3 - custom dictionary needs rebuilding?
>
> Dave,
>
> It's an oversight that we didn't document that. Thanks for pointing that
> out! I'll  update the documentation tomorrow and post again with proper
> details of the command to use.
>
> The short answer is that hsqldb jar includes a GUI. You use the 2.3.4
> version of that jar, which is included with the cTAKES 4.0 convenience
> binary, to open your database. Then you close (shutdown) your DB and it
> gets converted to 2.3.4 for you. But if the .properties file for your DB
> indicates it's readonly, you need to have edited the properties file first
> to remove that. That gives you the idea. I'll write it up properly tomorrow.
>
> Thanks for testing,
> -- James
>
>
> *Sent from my phone.*
>
> On Apr 22, 2017 9:39 PM, "David Kincaid"  wrote:
>
> I finally had some time this weekend to give 4.0.0rc3 a run. The install
> and configuration instructions worked perfectly on my Linux laptop and I
> was able to use CVD to run the clinical pipeline successfully against some
> of the clinical notes I have here.
>
> However, I ran into a problem when I tried running my own customized
> pipeline that includes a custom dictionary that I create (to include the
> SNOMED veterinary extension). I get an exception while it's loading that
> the dictionary is not the correct version. So, I assume I just need to
> recreate that custom database? Or is there a migration utility from the old
> database version to the new version?
>
> I don't remember seeing mention of this, but certainly could have missed
> it. If it's not in the release notes or install/upgrade instructions it
> probably should be there.
>
> Thanks to everyone for pulling this release together and getting it out
> the door!
>
> - Dave
>


Re: cTAKES 4.0rc3 - custom dictionary needs rebuilding?

2017-04-23 Thread David Kincaid
Sorry about this. Actually no, it's not working. I just created a new blank
database apparently in a new directory. Still having the original problem
trying to open the DB in HSQL GUI. Any ideas?

- Dave

On Sun, Apr 23, 2017 at 2:23 PM, David Kincaid 
wrote:

> I got it figured out. I was using the wrong URL path for the database.
> Working great now!
>
> - Dave
>
> On Sun, Apr 23, 2017 at 12:31 PM, David Kincaid 
> wrote:
>
>> Thanks for the reply, James. I'm not very familiar with HSQLDB, so I may
>> be doing it wrong, but I launched the GUI using "java
>> -jar hsqldb-2.3.4.jar" and then tried to connect to the DB by changing the
>> "Type" to "HSQL Database Engine Standalone" and then setting the URL to
>> point to my custom dictionary directory. It throws a SQLException that says
>> "wrong database file version". Is there some other step I'm missing?
>>
>> - Dave
>>
>> On Sat, Apr 22, 2017 at 10:31 PM, James Masanz 
>> wrote:
>>
>>> Dave,
>>>
>>> It's an oversight that we didn't document that. Thanks for pointing that
>>> out! I'll  update the documentation tomorrow and post again with proper
>>> details of the command to use.
>>>
>>> The short answer is that hsqldb jar includes a GUI. You use the 2.3.4
>>> version of that jar, which is included with the cTAKES 4.0 convenience
>>> binary, to open your database. Then you close (shutdown) your DB and it
>>> gets converted to 2.3.4 for you. But if the .properties file for your DB
>>> indicates it's readonly, you need to have edited the properties file
>>> first
>>> to remove that. That gives you the idea. I'll write it up properly
>>> tomorrow.
>>>
>>> Thanks for testing,
>>> -- James
>>>
>>>
>>> *Sent from my phone.*
>>>
>>> On Apr 22, 2017 9:39 PM, "David Kincaid"  wrote:
>>>
>>> I finally had some time this weekend to give 4.0.0rc3 a run. The install
>>> and configuration instructions worked perfectly on my Linux laptop and I
>>> was able to use CVD to run the clinical pipeline successfully against
>>> some
>>> of the clinical notes I have here.
>>>
>>> However, I ran into a problem when I tried running my own customized
>>> pipeline that includes a custom dictionary that I create (to include the
>>> SNOMED veterinary extension). I get an exception while it's loading that
>>> the dictionary is not the correct version. So, I assume I just need to
>>> recreate that custom database? Or is there a migration utility from the
>>> old
>>> database version to the new version?
>>>
>>> I don't remember seeing mention of this, but certainly could have missed
>>> it. If it's not in the release notes or install/upgrade instructions it
>>> probably should be there.
>>>
>>> Thanks to everyone for pulling this release together and getting it out
>>> the
>>> door!
>>>
>>> - Dave
>>>
>>
>>
>


RE: Docker

2017-04-23 Thread Finan, Sean
Keep in mind one very important thing:

You need to be very careful about redistribution of a umls database.  Many 
years ago ctakes had to get special permission to post a copy on sourceforge.  
As you all know, use of that distribution requires a umls username and password 
check per-ctakes launch.  This was also a requirement placed upon ctakes by the 
nlm per the agreement.

Public distribution of Oracle Java in a docker container is technically 
illegal, but in the beginning a lot of people were not reading eula info and 
went smooth criminal.  Strange but true.  Now people know to use OpenJDK.  I 
have not contacted the nlm regarding docker and the umls.  Has anybody else out 
there?  If so please let us know.

For a private container inclusion of the dictionary is fine (we have one at 
harvard).   Otherwise there are ways to use / copy s3 files at runtime, you 
would just need to document a static location for the database, etc. etc.

Sean

-Original Message-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Sunday, April 23, 2017 5:56 AM
To: dev@ctakes.apache.org
Subject: Re: Docker

Dockerizing ctakes as a build was useful at one time for sure.

If running as a microservice remember the size of the image is problematic ; 
you don't want it on lots of different nodes if using something like kubernetes.

Also remember to make sure you run with Xmx args so that cgroups done constrain 
the jvm memory guess, otherwise you'll get OOME errors.

> On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov  wrote:
> 
> I've tried to create service from
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_r_llin_docker-5Fapache-5Fctakes_-7E_dockerfile_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=P2oSxIaW_ShWXNZ3wdqY6W-Rz20Hy_FHp3JPXTHOdcw&e=
>  , without
> success.
> 
> However Docker file looks as follows:
> 
> FROM java:7
> ADD 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.softaculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ&e=
>  
> 3.2.2-bin.tar.gz
> ADD 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.googleapis.com_google-2Dcode-2Darchive-2Ddownloads_v2_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5TKzWGQuk&e=
>  
> code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz
> RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes
> RUN mkdir temp
> RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/
> RUN cp -a temp/lib/. /apache-ctakes/lib/
> RUN rm apache-ctakes-3.2.2-bin.tar.gz
> RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> RUN rm -r temp
> 
> Hope it helps.
> 
> 
> 
> 
>> On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov  wrote:
>> 
>> Here is an output
>> 
>> *tmills/ctakes-as*  cTAKES and UIMA-AS binaries with
>> a few scr...   0
>> *jayunit100/ctakes-example-image-mvn*
>>   0
>> *llin/docker_apache_ctakes  *   Docker image for apache
>> ctakes  0[OK]
>> 
>> 0 - means stars/rating
>> OK - means, automated.
>> 
>> 
>> 
>> 
>> 
>>> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov  wrote:
>>> 
>>> Hi,
>>> did you tried:
>>> docker search ctakes ?
>>> 
>>> If any body did that, and put in the repository, you could have see it.
>>> 
>>> Oleg
>>> 
>>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green <
>>> john.travis.gr...@gmail.com> wrote:
>>> 
 Has anyone dockerized ctakes? If so do you mind sending the Dockerfile,
 thanks! John Green
 
 
 
 
>>> 
>> 


RE: cTAKES 4.0rc3 - custom dictionary needs rebuilding?

2017-04-23 Thread Finan, Sean
Thank you James - great explanation and information.

Hi David,
There is a new version of the dictionary creator gui distributed with ctakes 
4.0.  It is 99% the same as the version in sandbox.  One important difference 
is that it now produces a database in hsqldb 2.3.4 - compatible with ctakes 
4.0.  
Another difference (important to you) is that there is an exclusion list in a 
data file that lists cui / term combinations that can be excluded.  By default 
the medications "toDAY" and "ToMORROW" are in that list.  It is obviously done 
to prevent the frequent false positives that you spoke of in your Amia 
presentation.  For non-vet use these exclusions are pretty valid.  Since you 
are interested in a vet dictionary fi you use the new dictionary creator you 
can decide whether or not you want them to be included.

Sean

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Saturday, April 22, 2017 11:32 PM
To: dev@ctakes.apache.org
Subject: Re: cTAKES 4.0rc3 - custom dictionary needs rebuilding?

Dave,

It's an oversight that we didn't document that. Thanks for pointing that out! 
I'll  update the documentation tomorrow and post again with proper details of 
the command to use.

The short answer is that hsqldb jar includes a GUI. You use the 2.3.4 version 
of that jar, which is included with the cTAKES 4.0 convenience binary, to open 
your database. Then you close (shutdown) your DB and it gets converted to 2.3.4 
for you. But if the .properties file for your DB indicates it's readonly, you 
need to have edited the properties file first to remove that. That gives you 
the idea. I'll write it up properly tomorrow.

Thanks for testing,
-- James


*Sent from my phone.*

On Apr 22, 2017 9:39 PM, "David Kincaid"  wrote:

I finally had some time this weekend to give 4.0.0rc3 a run. The install and 
configuration instructions worked perfectly on my Linux laptop and I was able 
to use CVD to run the clinical pipeline successfully against some of the 
clinical notes I have here.

However, I ran into a problem when I tried running my own customized pipeline 
that includes a custom dictionary that I create (to include the SNOMED 
veterinary extension). I get an exception while it's loading that the 
dictionary is not the correct version. So, I assume I just need to recreate 
that custom database? Or is there a migration utility from the old database 
version to the new version?

I don't remember seeing mention of this, but certainly could have missed it. If 
it's not in the release notes or install/upgrade instructions it probably 
should be there.

Thanks to everyone for pulling this release together and getting it out the 
door!

- Dave


Re: Docker

2017-04-23 Thread Jay Vyas
Dockerizing ctakes as a build was useful at one time for sure.

If running as a microservice remember the size of the image is problematic ; 
you don't want it on lots of different nodes if using something like kubernetes.

Also remember to make sure you run with Xmx args so that cgroups done constrain 
the jvm memory guess, otherwise you'll get OOME errors.

> On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov  wrote:
> 
> I've tried to create service from
> https://hub.docker.com/r/llin/docker_apache_ctakes/~/dockerfile/, without
> success.
> 
> However Docker file looks as follows:
> 
> FROM java:7
> ADD http://mirror.softaculous.com/apache/ctakes/ctakes-3.2.2/apache-ctakes-
> 3.2.2-bin.tar.gz
> ADD https://storage.googleapis.com/google-code-archive-downloads/v2/
> code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz
> RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes
> RUN mkdir temp
> RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/
> RUN cp -a temp/lib/. /apache-ctakes/lib/
> RUN rm apache-ctakes-3.2.2-bin.tar.gz
> RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> RUN rm -r temp
> 
> Hope it helps.
> 
> 
> 
> 
>> On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov  wrote:
>> 
>> Here is an output
>> 
>> *tmills/ctakes-as*  cTAKES and UIMA-AS binaries with
>> a few scr...   0
>> *jayunit100/ctakes-example-image-mvn*
>>   0
>> *llin/docker_apache_ctakes  *   Docker image for apache
>> ctakes  0[OK]
>> 
>> 0 - means stars/rating
>> OK - means, automated.
>> 
>> 
>> 
>> 
>> 
>>> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov  wrote:
>>> 
>>> Hi,
>>> did you tried:
>>> docker search ctakes ?
>>> 
>>> If any body did that, and put in the repository, you could have see it.
>>> 
>>> Oleg
>>> 
>>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green <
>>> john.travis.gr...@gmail.com> wrote:
>>> 
 Has anyone dockerized ctakes? If so do you mind sending the Dockerfile,
 thanks! John Green
 
 
 
 
>>> 
>> 


Re: Docker

2017-04-23 Thread Oleg Tikhonov
I've tried to create service from
https://hub.docker.com/r/llin/docker_apache_ctakes/~/dockerfile/, without
success.

However Docker file looks as follows:

FROM java:7
ADD http://mirror.softaculous.com/apache/ctakes/ctakes-3.2.2/apache-ctakes-
3.2.2-bin.tar.gz
ADD https://storage.googleapis.com/google-code-archive-downloads/v2/
code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz
RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes
RUN mkdir temp
RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/
RUN cp -a temp/lib/. /apache-ctakes/lib/
RUN rm apache-ctakes-3.2.2-bin.tar.gz
RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
RUN rm -r temp

Hope it helps.




On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov  wrote:

> Here is an output
>
> *tmills/ctakes-as*  cTAKES and UIMA-AS binaries with
> a few scr...   0
> *jayunit100/ctakes-example-image-mvn*
>0
> *llin/docker_apache_ctakes  *   Docker image for apache
> ctakes  0[OK]
>
> 0 - means stars/rating
> OK - means, automated.
>
>
>
>
>
> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov  wrote:
>
>> Hi,
>> did you tried:
>> docker search ctakes ?
>>
>> If any body did that, and put in the repository, you could have see it.
>>
>> Oleg
>>
>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green <
>> john.travis.gr...@gmail.com> wrote:
>>
>>> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile,
>>> thanks! John Green
>>>
>>>
>>>
>>>
>>
>