Thank you, Thejan! -----Original Message----- From: Thejan Wijesinghe [mailto:thejan.k.wijesin...@gmail.com] Sent: Wednesday, May 31, 2017 5:40 PM To: dev@tika.apache.org Subject: Re: experiences with Tika in Docker
Hi Tim, I've used Tika -server in docker but as a single instance only. Yes, its ability to limit container's resources with related to memory & CPU in the host machine is great, it gives us so much flexibility, we could enforce hard/soft memory limits, we could even manipulate the host machine's CPU cycles. Yes, it also limits risks of executing arbitrary code & XXE vulnerabilities. I already asked Prof. Chris Mattmann about officially moving to dockerhub. He said I need to make a mail to apache infra asking about this. Unfortunately, I still couldn't find a time to make that mail. We already have multiple dockerfiles in Tika, , dockerfile in tika-server, InceptionRestDockerfile, InceptionVideoRestDockerfile, Im2txtRestDockerfile(PR #180-for image captioning). Part of my GSoC project is to unify the existing REST services such as object recognition, image captioning. My idea is to unify all of those REST services where the user can start/terminate, see statistics of any REST service through a web based GUI. I'm expecting to use a fusion of nginx(as the reverse proxy server) & docker to make it work. So obviously we will see docker much often in Tika. +1 for your thought to looking into hardening the tika-server with the +help of docker. best, ThejanW On Thu, Jun 1, 2017 at 1:03 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > Dave Meikle, Tom and All, > > How many of us are using Tika in Docker? If so, how exactly are > you using it? Single instance, swarm, Kubernetes, something else? > People fear I/O hit with tika-server...what are your experiences? > I really like the ability to limit the number of CPUs in the Docker > container. If a single doc causes multithreaded gc to go nuts, that > won't kill an entire machine. This also cleanly limits the risk from > XXE or arbitrary code execution, right? > > If this is one of the ways of the future for big data, we might want > to look into hardening tika-server (OOMs, timeouts). What do you all think? > > Cheers, > > Tim > > Timothy B. Allison, Ph.D. > Principal Artificial Intelligence Engineer Group Lead K83E/Human > Language Technology The MITRE Corporation > 7515 Colshire Drive, McLean, VA 22102 > 703-983-2473 (phone); 703-983-1379 (fax) > >