Yes, some of us have been developing an Elastic scaling stack for Tika server…

 

That does just that with AWS. Don’t have it ready to push upstream yet.


Cheers,

Chris

 

 

From: Eric Pugh <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, April 16, 2020 at 7:09 AM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

 

Does anyone have a good example of combining Tika with some sort of pool of 
Docker containers?   I think a lot of folks treat their Tika server like a pet, 
not like a cow.  
https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ 
<https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/>

 

I wonder if we could ship some “recipes” that describe how to deploy a pool of 
Tika’s.    Tika running over 200% for 1 hour, kill it and start the next.

 

 

 

On Apr 16, 2020, at 9:40 AM, Nick Burch <[email protected]> wrote:

On Wed, 15 Apr 2020, [email protected] wrote:

I have encountered an issue with Tika running locally on a box that the Java 
runtime goes up to over 200% CPU, after running a bulk load of documents over a 
couple of days, it is more than 3 million documents.

Can you do a thread dump to show what the JVM is doing?

https://access.redhat.com/solutions/18178

Nick

 

_______________________

Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  

Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
       

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

 

 

Reply via email to