Hi John, The dockerized ctakes that we use is very specialized. It builds from the latest trunk, grabs and unpacks a custom dictionary, copies files to and from a secure s3 area, uses a custom cas consumer, etc. It sets up an s3 tool, preps input and output directories, etc. So, I have never actually created a "pure ctakes" container. Below I've just kind of imagined what it might be like ...
# I would recommend that you use a very simple parent like FROM openjdk:8 # Create a ctakes home directory ENV CTAKES_DIR /ctakes_4/ RUN mkdir -p $CTAKES_DIR # Download the ctakes 4.0 binary to a local directory and add it to your container (the add command also untars) ADD localDir/apache-ctakes-4.0.0-bin.tar.gz $CTAKES_DIR # Also download and unzip the ctakes sno_rx_16ab dictionary or copy in your custom dictionary # run the default clinical pipeline WORKDIR $CTAKES_DIR/apache-ctakes-4.0.0/ # Use ENTRYPOINT to specify the command. ENTRYPOINT ["sh","-c","bin/runClinicalPipeline.sh"] # use CMD and specify default run settings: . CMD ["-i","inputDir","--xmiOut","outputDir","--user","umlsUsername","--pass","umlsPassword"] If you want to build and run the latest from trunk it would be something like this: #Build from the Java 8 maven image (includes JDK 8 and Maven) FROM maven:3.3-jdk-8 # We need subversion to checkout ctakes trunk. # Docker best practice to get update and get install in the same run call. RUN apt-get update && apt-get install -y \ subversion \ && rm -rf /var/lib/apt/lists/* # Container Directory for ctakes source. ENV CTAKES_SRC_DIR /ctakes_src/trunk/ # Create directories. RUN mkdir -p $CTAKES_SRC_DIR # Install ctakes. WORKDIR $CTAKES_SRC_DIR/.. # Check out ctakes trunk RUN svn checkout https://svn.apache.org/repos/asf/ctakes/trunk/ WORKDIR $CTAKES_SRC_DIR RUN mvn -DskipTests compile # Run ctakes ------ in my container I run a script that does more than run ctakes. Below is basically a copy and reformatted paste. # mvn exec:exec -Dexec.executable="java" -Dexec.args="-cp %classpath -Xmx4G org.apache.ctakes.custom.pipeline.CustomRunner $DEID_TXT_DIR/$patient_num/ $I2B2_BSV_DIR/$patient_num/" # Use ENTRYPOINT to specify the command. ENTRYPOINT ["mvn","exec:exec","-Dexec.executable=\"java\"" "-Dexec.args=\"-cp %classpath -Xmx4G org.apache.ctakes.core.pipeline.PiperFileRunner\""] # use CMD and specify default run settings: project name. CMD ["-i","inputDir","--xmiOut","outputDir","--user","umlsUserName","--pass","umlsPassword"] ________________________________________ From: John Travis Green <[email protected]> Sent: Sunday, April 23, 2017 8:59 PM To: [email protected] Subject: RE: Docker Sean: if you have it dockerized at harvard can you share the setup files? Regarding legalities, this is for in-house work, not redistribution. I work for the federal government and to deploy with our security restrictions its very convenient to have it deployed as a docker instance. Im a very strong advocate in the dod regarding ctakes use. I dont know of anyone else in my position pushing it. Were looking at some major uses regarding accessability of legacy data (recall the dod is transitioning to cerner, but we have a lot of data that will still need accessibility, not the least of which are physician researchers). But ctakes is difficult to deploy on dod systems because of our security requirements. If we can containerize it then it will make it more likely we use it. Thanks, John Keep in mind one very important thing: You need to be very careful about redistribution of a umls database. Many years ago ctakes had to get special permission to post a copy on sourceforge. As you all know, use of that distribution requires a umls username and password check per-ctakes launch. This was also a requirement placed upon ctakes by the nlm per the agreement. Public distribution of Oracle Java in a docker container is technically illegal, but in the beginning a lot of people were not reading eula info and went smooth criminal. Strange but true. Now people know to use OpenJDK. I have not contacted the nlm regarding docker and the umls. Has anybody else out there? If so please let us know. For a private container inclusion of the dictionary is fine (we have one at harvard). Otherwise there are ways to use / copy s3 files at runtime, you would just need to document a static location for the database, etc. etc. Sean -----Original Message----- From: Jay Vyas [mailto:[email protected]] Sent: Sunday, April 23, 2017 5:56 AM To: [email protected] Subject: Re: Docker Dockerizing ctakes as a build was useful at one time for sure. If running as a microservice remember the size of the image is problematic ; you don't want it on lots of different nodes if using something like kubernetes. Also remember to make sure you run with Xmx args so that cgroups done constrain the jvm memory guess, otherwise you'll get OOME errors. > On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov <[email protected]> wrote: > > I've tried to create service from > https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_r_llin_docker-5Fapache-5Fctakes_-7E_dockerfile_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=P2oSxIaW_ShWXNZ3wdqY6W-Rz20Hy_FHp3JPXTHOdcw&e= > , without > success. > > However Docker file looks as follows: > > FROM java:7 > ADD > https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.softaculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ&e= > 3.2.2-bin.tar.gz > ADD > https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.googleapis.com_google-2Dcode-2Darchive-2Ddownloads_v2_&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0&s=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5TKzWGQuk&e= > code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip > RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz > RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes > RUN mkdir temp > RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/ > RUN cp -a temp/lib/. /apache-ctakes/lib/ > RUN rm apache-ctakes-3.2.2-bin.tar.gz > RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip > RUN rm -r temp > > Hope it helps. > > > > >> On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov <[email protected]> wrote: >> >> Here is an output >> >> *tmills/ctakes-as* cTAKES and UIMA-AS binaries with >> a few scr... 0 >> *jayunit100/ctakes-example-image-mvn* >> 0 >> *llin/docker_apache_ctakes * Docker image for apache >> ctakes 0 [OK] >> >> 0 - means stars/rating >> OK - means, automated. >> >> >> >> >> >>> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov <[email protected]> wrote: >>> >>> Hi, >>> did you tried: >>> docker search ctakes ? >>> >>> If any body did that, and put in the repository, you could have see it. >>> >>> Oleg >>> >>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green < >>> [email protected]> wrote: >>> >>>> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile, >>>> thanks! John Green >>>> >>>> >>>> >>>> >>> >>
