Thank you, Thejan!

-----Original Message-----
From: Thejan Wijesinghe [mailto:thejan.k.wijesin...@gmail.com] 
Sent: Wednesday, May 31, 2017 5:40 PM
To: dev@tika.apache.org
Subject: Re: experiences with Tika in Docker

Hi Tim,

I've used Tika -server in docker but as a single instance only. Yes, its 
ability to limit container's resources with related to memory & CPU in the host 
machine is great, it gives us so much flexibility, we could enforce hard/soft 
memory limits, we could even manipulate the host machine's CPU cycles. Yes, it 
also limits risks of executing arbitrary code & XXE vulnerabilities. I already 
asked Prof. Chris Mattmann about officially moving to dockerhub. He said I need 
to make a mail to apache infra asking about this. Unfortunately, I still 
couldn't find a time to make that mail.

We already have multiple dockerfiles in Tika, , dockerfile in tika-server, 
InceptionRestDockerfile, InceptionVideoRestDockerfile, Im2txtRestDockerfile(PR 
#180-for image captioning).

Part of my GSoC project is to unify the existing REST services such as object 
recognition, image captioning. My idea is to unify all of those REST services 
where the user can start/terminate, see statistics of any REST service through 
a web based GUI. I'm expecting to use a fusion of nginx(as the reverse proxy 
server) & docker to make it work. So obviously we will see docker much often in 
Tika.

+1 for your thought to looking into hardening the tika-server with the 
+help
of docker.

best,
ThejanW

On Thu, Jun 1, 2017 at 1:03 AM, Allison, Timothy B. <talli...@mitre.org>
wrote:

> Dave Meikle, Tom and All,
>
>     How many of us are using Tika in Docker?  If so, how exactly are 
> you using it?  Single instance, swarm, Kubernetes, something else?  
> People fear I/O hit with tika-server...what are your experiences?
> I really like the ability to limit the number of CPUs in the Docker 
> container.  If a single doc causes multithreaded gc to go nuts, that 
> won't kill an entire machine.  This also cleanly limits the risk from 
> XXE or arbitrary code execution, right?
>
> If this is one of the ways of the future for big data, we might want 
> to look into hardening tika-server (OOMs, timeouts).  What do you all think?
>
>         Cheers,
>
>                 Tim
>
> Timothy B. Allison, Ph.D.
> Principal Artificial Intelligence Engineer Group Lead K83E/Human 
> Language Technology The MITRE Corporation
> 7515 Colshire Drive, McLean, VA  22102
> 703-983-2473 (phone); 703-983-1379 (fax)
>
>

Reply via email to