Yes, some of us have been developing an Elastic scaling stack for Tika server…
That does just that with AWS. Don’t have it ready to push upstream yet. Cheers, Chris From: Eric Pugh <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, April 16, 2020 at 7:09 AM To: "[email protected]" <[email protected]> Subject: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage Does anyone have a good example of combining Tika with some sort of pool of Docker containers? I think a lot of folks treat their Tika server like a pet, not like a cow. https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ <https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/> I wonder if we could ship some “recipes” that describe how to deploy a pool of Tika’s. Tika running over 200% for 1 hour, kill it and start the next. On Apr 16, 2020, at 9:40 AM, Nick Burch <[email protected]> wrote: On Wed, 15 Apr 2020, [email protected] wrote: I have encountered an issue with Tika running locally on a box that the Java runtime goes up to over 200% CPU, after running a bulk load of documents over a couple of days, it is more than 3 million documents. Can you do a thread dump to show what the JVM is doing? https://access.redhat.com/solutions/18178 Nick _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
