Re: Docker
Hi John, The dockerized ctakes that we use is very specialized. It builds from the latest trunk, grabs and unpacks a custom dictionary, copies files to and from a secure s3 area, uses a custom cas consumer, etc. It sets up an s3 tool, preps input and output directories, etc. So, I have never actually created a "pure ctakes" container. Below I've just kind of imagined what it might be like ... # I would recommend that you use a very simple parent like FROM openjdk:8 # Create a ctakes home directory ENV CTAKES_DIR /ctakes_4/ RUN mkdir -p $CTAKES_DIR # Download the ctakes 4.0 binary to a local directory and add it to your container (the add command also untars) ADD localDir/apache-ctakes-4.0.0-bin.tar.gz $CTAKES_DIR # Also download and unzip the ctakes sno_rx_16ab dictionary or copy in your custom dictionary # run the default clinical pipeline WORKDIR $CTAKES_DIR/apache-ctakes-4.0.0/ # Use ENTRYPOINT to specify the command. ENTRYPOINT ["sh","-c","bin/runClinicalPipeline.sh"] # use CMD and specify default run settings: . CMD ["-i","inputDir","--xmiOut","outputDir","--user","umlsUsername","--pass","umlsPassword"] If you want to build and run the latest from trunk it would be something like this: #Build from the Java 8 maven image (includes JDK 8 and Maven) FROM maven:3.3-jdk-8 # We need subversion to checkout ctakes trunk. # Docker best practice to get update and get install in the same run call. RUN apt-get update && apt-get install -y \ subversion \ && rm -rf /var/lib/apt/lists/* # Container Directory for ctakes source. ENV CTAKES_SRC_DIR /ctakes_src/trunk/ # Create directories. RUN mkdir -p $CTAKES_SRC_DIR # Install ctakes. WORKDIR $CTAKES_SRC_DIR/.. # Check out ctakes trunk RUN svn checkout https://svn.apache.org/repos/asf/ctakes/trunk/ WORKDIR $CTAKES_SRC_DIR RUN mvn -DskipTests compile # Run ctakes -- in my container I run a script that does more than run ctakes. Below is basically a copy and reformatted paste. # mvn exec:exec -Dexec.executable="java" -Dexec.args="-cp %classpath -Xmx4G org.apache.ctakes.custom.pipeline.CustomRunner $DEID_TXT_DIR/$patient_num/ $I2B2_BSV_DIR/$patient_num/" # Use ENTRYPOINT to specify the command. ENTRYPOINT ["mvn","exec:exec","-Dexec.executable=\"java\"" "-Dexec.args=\"-cp %classpath -Xmx4G org.apache.ctakes.core.pipeline.PiperFileRunner\""] # use CMD and specify default run settings: project name. CMD ["-i","inputDir","--xmiOut","outputDir","--user","umlsUserName","--pass","umlsPassword"] From: John Travis Green <john.travis.gr...@gmail.com> Sent: Sunday, April 23, 2017 8:59 PM To: dev@ctakes.apache.org Subject: RE: Docker Sean: if you have it dockerized at harvard can you share the setup files? Regarding legalities, this is for in-house work, not redistribution. I work for the federal government and to deploy with our security restrictions its very convenient to have it deployed as a docker instance. Im a very strong advocate in the dod regarding ctakes use. I dont know of anyone else in my position pushing it. Were looking at some major uses regarding accessability of legacy data (recall the dod is transitioning to cerner, but we have a lot of data that will still need accessibility, not the least of which are physician researchers). But ctakes is difficult to deploy on dod systems because of our security requirements. If we can containerize it then it will make it more likely we use it. Thanks, John Keep in mind one very important thing: You need to be very careful about redistribution of a umls database. Many years ago ctakes had to get special permission to post a copy on sourceforge. As you all know, use of that distribution requires a umls username and password check per-ctakes launch. This was also a requirement placed upon ctakes by the nlm per the agreement. Public distribution of Oracle Java in a docker container is technically illegal, but in the beginning a lot of people were not reading eula info and went smooth criminal. Strange but true. Now people know to use OpenJDK. I have not contacted the nlm regarding docker and the umls. Has anybody else out there? If so please let us know. For a private container inclusion of the dictionary is fine (we have one at harvard). Otherwise there are ways to use / copy s3 files at runtime, you would just need to document a static location for the database, etc. etc. Sean -Original Message- From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] Sent: Sunday, April 23, 2017 5:56 AM To: dev@ctakes.apache.org Subject: Re: Docker Dockerizing ctakes as a build was useful at one t
Re: Docker
One of those that Oleg found is my github repo which is very early stages: https://github.com/tmills/ctakes-docker it can create 2 docker images, one for a UIMA AS queue server and another that downloads ctakes, installs the dictionary, and starts a basic concept extraction server with a UIMA AS descriptor. There is a sample environment variables file where you need to enter your UMLS credentials. It is a big image but because it is a simple pipeline it can run with a smaller memory footprint. This project just started so the reader end is underdeveloped, I've just been pointing to it from a CVD with a remote descriptor. Tim On Sun, 2017-04-23 at 18:59 -0600, John Travis Green wrote: > Sean: if you have it dockerized at harvard can you share the setup > files? Regarding legalities, this is for in-house work, not > redistribution. I work for the federal government and to deploy with > our security restrictions its very convenient to have it deployed as > a docker instance. Im a very strong advocate in the dod regarding > ctakes use. I dont know of anyone else in my position pushing it. > Were looking at some major uses regarding accessability of legacy > data (recall the dod is transitioning to cerner, but we have a lot of > data that will still need accessibility, not the least of which are > physician researchers). But ctakes is difficult to deploy on dod > systems because of our security requirements. If we can containerize > it then it will make it more likely we use it. Thanks, John > > > > Keep in mind one very important thing: > > > > You need to be very careful about redistribution of a umls > database. Many years ago ctakes had to get special permission to > post a copy on sourceforge. As you all know, use of that > distribution requires a umls username and password check per-ctakes > launch. This was also a requirement placed upon ctakes by the nlm > per the agreement. > > > > Public distribution of Oracle Java in a docker container is > technically illegal, but in the beginning a lot of people were not > reading eula info and went smooth criminal. Strange but true. Now > people know to use OpenJDK. I have not contacted the nlm regarding > docker and the umls. Has anybody else out there? If so please let > us know. > > > > For a private container inclusion of the dictionary is fine (we have > one at harvard). Otherwise there are ways to use / copy s3 files at > runtime, you would just need to document a static location for the > database, etc. etc. > > > > Sean > > > > -Original Message----- > > From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] > Sent: Sunday, April 23, 2017 5:56 AM > > To: dev@ctakes.apache.org > > Subject: Re: Docker > > > > Dockerizing ctakes as a build was useful at one time for sure. > > > > If running as a microservice remember the size of the image is > problematic ; you don't want it on lots of different nodes if using > something like kubernetes. > > > > Also remember to make sure you run with Xmx args so that cgroups done > constrain the jvm memory guess, otherwise you'll get OOME errors. > > > > > > > On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov <o...@apache.org> wrote: > > > > > > I've tried to create service from > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com > > _r_llin_docker-5Fapache-5Fctakes_- > > 7E_dockerfile_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe > > FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8kr > > nlh_KgNiyydac8pJOidOHZ9T8R0=P2oSxIaW_ShWXNZ3wdqY6W- > > Rz20Hy_FHp3JPXTHOdcw= , without > > > > success. > > > > > > However Docker file looks as follows: > > > > > > FROM java:7 > > > > ADD https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.soft > > aculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes- > > 2D=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67Gvl > > GZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8krnlh_KgNiyyda > > c8pJOidOHZ9T8R0=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ= > > 3.2.2-bin.tar.gz > > > > ADD https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.go > > ogleapis.com_google-2Dcode-2Darchive- > > 2Ddownloads_v2_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx > > eFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8k > > rnlh_KgNiyydac8pJOidOHZ9T8R0=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5T > > KzWGQuk= > > code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip > > > > RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz > > &
RE: Docker
Keep in mind one very important thing: You need to be very careful about redistribution of a umls database. Many years ago ctakes had to get special permission to post a copy on sourceforge. As you all know, use of that distribution requires a umls username and password check per-ctakes launch. This was also a requirement placed upon ctakes by the nlm per the agreement. Public distribution of Oracle Java in a docker container is technically illegal, but in the beginning a lot of people were not reading eula info and went smooth criminal. Strange but true. Now people know to use OpenJDK. I have not contacted the nlm regarding docker and the umls. Has anybody else out there? If so please let us know. For a private container inclusion of the dictionary is fine (we have one at harvard). Otherwise there are ways to use / copy s3 files at runtime, you would just need to document a static location for the database, etc. etc. Sean -Original Message- From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] Sent: Sunday, April 23, 2017 5:56 AM To: dev@ctakes.apache.org Subject: Re: Docker Dockerizing ctakes as a build was useful at one time for sure. If running as a microservice remember the size of the image is problematic ; you don't want it on lots of different nodes if using something like kubernetes. Also remember to make sure you run with Xmx args so that cgroups done constrain the jvm memory guess, otherwise you'll get OOME errors. > On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov <o...@apache.org> wrote: > > I've tried to create service from > https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_r_llin_docker-5Fapache-5Fctakes_-7E_dockerfile_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0=P2oSxIaW_ShWXNZ3wdqY6W-Rz20Hy_FHp3JPXTHOdcw= > , without > success. > > However Docker file looks as follows: > > FROM java:7 > ADD > https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.softaculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ= > > 3.2.2-bin.tar.gz > ADD > https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.googleapis.com_google-2Dcode-2Darchive-2Ddownloads_v2_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8krnlh_KgNiyydac8pJOidOHZ9T8R0=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5TKzWGQuk= > > code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip > RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz > RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes > RUN mkdir temp > RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/ > RUN cp -a temp/lib/. /apache-ctakes/lib/ > RUN rm apache-ctakes-3.2.2-bin.tar.gz > RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip > RUN rm -r temp > > Hope it helps. > > > > >> On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov <o...@apache.org> wrote: >> >> Here is an output >> >> *tmills/ctakes-as* cTAKES and UIMA-AS binaries with >> a few scr... 0 >> *jayunit100/ctakes-example-image-mvn* >> 0 >> *llin/docker_apache_ctakes * Docker image for apache >> ctakes 0[OK] >> >> 0 - means stars/rating >> OK - means, automated. >> >> >> >> >> >>> On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov <o...@apache.org> wrote: >>> >>> Hi, >>> did you tried: >>> docker search ctakes ? >>> >>> If any body did that, and put in the repository, you could have see it. >>> >>> Oleg >>> >>> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green < >>> john.travis.gr...@gmail.com> wrote: >>> >>>> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile, >>>> thanks! John Green >>>> >>>> >>>> >>>> >>> >>
Re: Docker
I've tried to create service from https://hub.docker.com/r/llin/docker_apache_ctakes/~/dockerfile/, without success. However Docker file looks as follows: FROM java:7 ADD http://mirror.softaculous.com/apache/ctakes/ctakes-3.2.2/apache-ctakes- 3.2.2-bin.tar.gz ADD https://storage.googleapis.com/google-code-archive-downloads/v2/ code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes RUN mkdir temp RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/ RUN cp -a temp/lib/. /apache-ctakes/lib/ RUN rm apache-ctakes-3.2.2-bin.tar.gz RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip RUN rm -r temp Hope it helps. On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonovwrote: > Here is an output > > *tmills/ctakes-as* cTAKES and UIMA-AS binaries with > a few scr... 0 > *jayunit100/ctakes-example-image-mvn* >0 > *llin/docker_apache_ctakes * Docker image for apache > ctakes 0[OK] > > 0 - means stars/rating > OK - means, automated. > > > > > > On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonov wrote: > >> Hi, >> did you tried: >> docker search ctakes ? >> >> If any body did that, and put in the repository, you could have see it. >> >> Oleg >> >> On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green < >> john.travis.gr...@gmail.com> wrote: >> >>> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile, >>> thanks! John Green >>> >>> >>> >>> >> >
Re: Docker
Here is an output *tmills/ctakes-as* cTAKES and UIMA-AS binaries with a few scr... 0 *jayunit100/ctakes-example-image-mvn* 0 *llin/docker_apache_ctakes * Docker image for apache ctakes 0[OK] 0 - means stars/rating OK - means, automated. On Sun, Apr 23, 2017 at 7:50 AM, Oleg Tikhonovwrote: > Hi, > did you tried: > docker search ctakes ? > > If any body did that, and put in the repository, you could have see it. > > Oleg > > On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green < > john.travis.gr...@gmail.com> wrote: > >> Has anyone dockerized ctakes? If so do you mind sending the Dockerfile, >> thanks! John Green >> >> >> >> >
Re: Docker
Hi, did you tried: docker search ctakes ? If any body did that, and put in the repository, you could have see it. Oleg On Sun, Apr 23, 2017 at 1:58 AM, John Travis Green < john.travis.gr...@gmail.com> wrote: > Has anyone dockerized ctakes? If so do you mind sending the Dockerfile, > thanks! John Green > > > >