Hi John,

The dockerized ctakes that we use is very specialized.  It builds from the 
latest trunk, grabs and unpacks a custom dictionary, copies files to and from a 
secure s3 area, uses a custom cas consumer, etc.  It sets up an s3 tool, preps 
input and output directories, etc.  So, I have never actually created a "pure 
ctakes" container.  Below I've just kind of imagined what it might be like ...

# I would recommend that you use a very simple parent like
FROM openjdk:8

# Create a ctakes home directory
ENV CTAKES_DIR /ctakes_4/
RUN mkdir -p $CTAKES_DIR

# Download the ctakes 4.0 binary to a local directory and add it to your 
container (the add command also untars)
ADD localDir/apache-ctakes-4.0.0-bin.tar.gz $CTAKES_DIR
# Also download and unzip the ctakes sno_rx_16ab dictionary or copy in your 
custom dictionary

# run the default clinical pipeline
WORKDIR $CTAKES_DIR/apache-ctakes-4.0.0/
# Use ENTRYPOINT to specify the command.
ENTRYPOINT ["sh","-c","bin/runClinicalPipeline.sh"]
# use CMD and specify default run settings: .
CMD 
["-i","inputDir","--xmiOut","outputDir","--user","umlsUsername","--pass","umlsPassword"]



If you want to build and run the latest from trunk it would be something like 
this:

#Build from the Java 8 maven image (includes JDK 8 and Maven)
FROM maven:3.3-jdk-8

# We need subversion to checkout ctakes trunk.
# Docker best practice to get update and get install in the same run call.
RUN apt-get update && apt-get install -y \
  subversion \
   && rm -rf /var/lib/apt/lists/*

# Container Directory for ctakes source.
ENV CTAKES_SRC_DIR /ctakes_src/trunk/

# Create directories.
RUN mkdir -p $CTAKES_SRC_DIR

# Install ctakes.
WORKDIR $CTAKES_SRC_DIR/..
# Check out ctakes trunk
RUN svn checkout https://svn.apache.org/repos/asf/ctakes/trunk/
WORKDIR $CTAKES_SRC_DIR
RUN mvn -DskipTests compile

# Run ctakes ------ in my container I run a script that does more than run 
ctakes.  Below is basically a copy and reformatted paste.
#  mvn exec:exec -Dexec.executable="java" -Dexec.args="-cp %classpath -Xmx4G 
org.apache.ctakes.custom.pipeline.CustomRunner $DEID_TXT_DIR/$patient_num/ 
$I2B2_BSV_DIR/$patient_num/"

# Use ENTRYPOINT to specify the command.
ENTRYPOINT ["mvn","exec:exec","-Dexec.executable=\"java\"" "-Dexec.args=\"-cp 
%classpath -Xmx4G org.apache.ctakes.core.pipeline.PiperFileRunner\""]

# use CMD and specify default run settings: project name.
CMD 
["-i","inputDir","--xmiOut","outputDir","--user","umlsUserName","--pass","umlsPassword"]





________________________________________
From: John Travis Green <[email protected]>
Sent: Sunday, April 23, 2017 8:59 PM
To: [email protected]
Subject: RE: Docker

Sean: if you have it dockerized at harvard can you share the setup files? 
Regarding legalities, this is for in-house work, not redistribution. I work for 
the federal government and to deploy with our security restrictions its very 
convenient to have it deployed as a docker instance.  Im a very strong advocate 
in the dod regarding ctakes use. I dont know of anyone else in my position 
pushing it. Were looking at some major uses regarding accessability of legacy 
data (recall the dod is transitioning to cerner, but we have a lot of data that 
will still need accessibility, not the least of which are physician 
researchers). But ctakes is difficult to deploy on dod systems because of our 
security requirements. If we can containerize it then it will make it more 
likely we use it.  Thanks, John



Keep in mind one very important thing:



You need to be very careful about redistribution of a umls database.  Many 
years ago ctakes had to get special permission to post a copy on sourceforge.  
As you all know, use of that distribution requires a umls username and password 
check per-ctakes launch.  This was also a requirement placed upon ctakes by the 
nlm per the agreement.



Public distribution of Oracle Java in a docker container is technically 
illegal, but in the beginning a lot of people were not reading eula info and 
went smooth criminal.  Strange but true.  Now people know to use OpenJDK.  I 
have not contacted the nlm regarding docker and the umls.  Has anybody else out 
there?  If so please let us know.



For a private container inclusion of the dictionary is fine (we have one at 
harvard).   Otherwise there are ways to use / copy s3 files at runtime, you 
would just need to document a static location for the database, etc. etc.



Sean



-----Original Message-----

From: Jay Vyas [mailto:[email protected]]
Sent: Sunday, April 23, 2017 5:56 AM

To: [email protected]

Subject: Re: Docker



Dockerizing ctakes as a build was useful at one time for sure.



If running as a microservice remember the size of the image is problematic ; 
you don't want it on lots of different nodes if using something like kubernetes.



Also remember to make sure you run with Xmx args so that cgroups done constrain 
the jvm memory guess, otherwise you'll get OOME errors.



> On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov <[email protected]> wrote:

>
> I've tried to create service from

> https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_r_llin_docker-5Fapache-5Fctakes_-7E_dockerfile_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=P2oSxIaW_ShWXNZ3wdqY6W-Rz20Hy_FHp3JPXTHOdcw&e=
>  , without

> success.

>
> However Docker file looks as follows:

>
> FROM java:7

> ADD 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.softaculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ&e=
> 3.2.2-bin.tar.gz

> ADD 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.googleapis.com_google-2Dcode-2Darchive-2Ddownloads_v2_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5TKzWGQuk&e=
> code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip

> RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz

> RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes

> RUN mkdir temp

> RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/

> RUN cp -a temp/lib/. /apache-ctakes/lib/

> RUN rm apache-ctakes-3.2.2-bin.tar.gz

> RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip

> RUN rm -r temp

>
> Hope it helps.

>
>
>
>
>> On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov <[email protected]> wrote:

>>
>> Here is an output

>>
>> *tmills/ctakes-as*                      cTAKES and UIMA-AS binaries with

>> a few scr...   0

>> *jayunit100/ctakes-example-image-mvn*

>>                                           0

>> *llin/docker_apache_ctakes  *           Docker image for apache

>> ctakes                  0                    [OK]

>>
>> 0 - means stars/rating

>> OK - means, automated.

>>
>>
>>
>>
>>
>>> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov <[email protected]> wrote:

>>>
>>> Hi,

>>> did you tried:

>>> docker search ctakes ?

>>>
>>> If any body did that, and put in the repository, you could have see it.

>>>
>>> Oleg

>>>
>>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green <

>>> [email protected]> wrote:

>>>
>>>> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile,

>>>> thanks! John Green

>>>>
>>>>
>>>>
>>>>
>>>
>>

Reply via email to