Hi Eric, Will take a look. On a related note, I've created a new repos: https://github.com/apache/tika-docker
Thinking based on looking at the PRs and Issues on LogicalSpark docker-tikaserver, I'll create an updated docker file using what you've added here and look to publish builds to docker hub from that. What do you think? Cheers, Dave On Wed, 8 Jan 2020 at 03:16, Eric Pugh <ep...@opensourceconnections.com> wrote: > Hi all, I’ve gone ahead and added the -spawnChild property as a default > when running Tika Server as a service. I’d love some eyes on the PR, and > if this looks good, get it committed. > > Feedback welcome! > > Eric > > > > > On Dec 17, 2019, at 12:53 PM, Eric Pugh <ep...@opensourceconnections.com> > wrote: > > > > Cool. > > > > It’s the auto run that I really need, and the other part that I don’t > think I’ve tackled properly is the managing of logs… > > > > I’m going to check with my project to see if they support Snap packages. > > > > Eric > > > > > >> On Dec 16, 2019, at 5:10 PM, Tom Barber <t...@spicule.co.uk <mailto: > t...@spicule.co.uk>> wrote: > >> > >> Just saw this fly by and FYI on Linux systems that support Snap > packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install tika-server` > doesn’t yet auto-run I don’t believe but you can just run `tika-server.run` > and adding an init script wouldn’t take 5 minutes. > >> > >> Tom > >> > >> On 16 December 2019 at 18:42:55, Eric Pugh ( > ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>) > wrote: > >> > >>> Hi folks! > >>> > >>> I’ve got a mostly completed PR for having install scripts for Tika > Server, and I’m hoping a committer will take a look at the PR, and give > feedback (and ideally commit in time for 1.24!) > >>> > >>> A couple of things: > >>> > >>> 1) This was completely influenced by > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > < > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > >< > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script > < > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script>>, > in fact I started with the Solr scripts. > >>> > >>> 2) I’ve deleted all the Solr specific aspects (I think), however there > may still be more to delete. > >>> > >>> 3) This requires a change to how we release Tika, previously we ship > tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we > want to add the tika-server-bin.tgz and tika-server-bin.zip binary > distributions. > >>> > >>> I’m happy to start writing accompanying “how to deploy Tika Server” > docs if this PR looks good! Or, please give input and I’ll make the updates. > >>> > >>> Eric > >>> > >>> > >>> > On Dec 12, 2019, at 2:39 PM, Eric Pugh < > ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>> > wrote: > >>> > > >>> > I’ve created this JIRA to track this work: > https://issues.apache.org/jira/browse/TIKA-3010 < > https://issues.apache.org/jira/browse/TIKA-3010> < > https://issues.apache.org/jira/browse/TIKA-3010 < > https://issues.apache.org/jira/browse/TIKA-3010>> > >>> > > >>> > And a WIP progress PR is at https://github.com/apache/tika/pull/305 > <https://github.com/apache/tika/pull/305> < > https://github.com/apache/tika/pull/305 < > https://github.com/apache/tika/pull/305>> > >>> > > >>> > My thought is to put something together that mimics how we deploy > Solr, and see how that works. I have a need for an install process that a > general IT person can follow, who isn’t a Tika expert or a Docker users. > >>> > > >>> > > >>> > > >>> > > >>> >> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <mattm...@apache.org > <mailto:mattm...@apache.org> <mailto:mattm...@apache.org <mailto: > mattm...@apache.org>>> wrote: > >>> >> > >>> >> Thanks for bringing this conversation up Eric. > >>> >> > >>> >> > >>> >> > >>> >> Historically if you look over the last 5 years, I think what you > are asking below has sort of already become the de facto > >>> >> truth. Most people are in fact using Tika server, whether they are > individual devs, govvies, commercial folk and the like. > >>> >> > >>> >> Big, small and medium projects. Evidenced by the expansion of Tika > APIs into pretty much every PL I know and use of > >>> >> actively today. > >>> >> > >>> >> > >>> >> > >>> >> Given that, we probably should update the main website docs to make > this more prominent. The tika server docs on the > >>> >> wiki are pretty darn good. But they don’t get prime real estate. > Would be wonderful if someone wants to update the > >>> >> website to make it more prominent. > >>> >> > >>> >> > >>> >> > >>> >> The downstream Tika Python lib that I maintain has tons of activity > is used by more than 350+ projects and relies solely > >>> >> on Tika-Server. My recommendation to the Solr folks (having created > 7633) from the 2014 DARPA MEMEX days was to > >>> >> move towards Tika Server based SolrCell dep and that’s the right > way to go IMO. > >>> >> > >>> >> > >>> >> > >>> >> Chris > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> From: Eric Pugh <ep...@opensourceconnections.com <mailto: > ep...@opensourceconnections.com> <mailto:ep...@opensourceconnections.com > <mailto:ep...@opensourceconnections.com>>> > >>> >> Reply-To: "dev@tika.apache.org <mailto:dev@tika.apache.org> > <mailto:dev@tika.apache.org <mailto:dev@tika.apache.org>>" < > dev@tika.apache.org <mailto:dev@tika.apache.org> <mailto: > dev@tika.apache.org <mailto:dev@tika.apache.org>>> > >>> >> Date: Wednesday, December 4, 2019 at 12:24 PM > >>> >> To: "tika-...@apache.org <mailto:tika-...@apache.org> <mailto: > tika-...@apache.org <mailto:tika-...@apache.org>>" <tika-...@apache.org > <mailto:tika-...@apache.org> <mailto:tika-...@apache.org <mailto: > tika-...@apache.org>>> > >>> >> Subject: [EXTERNAL] Do we have a community supported approach for > deploying Tika Server in production? > >>> >> > >>> >> > >>> >> > >>> >> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user > question! > >>> >> > >>> >> > >>> >> > >>> >> Over in Solr land there has been renewed discussion about > streamlining what Solr is.... > >>> >> > >>> >> > >>> >> > >>> >> In regards to rich content extraction and the Tika project, it > seems like the two ideas that continue to preserve the existing behavior > are: > >>> >> > >>> >> > >>> >> > >>> >> 1) To convert the ExtractingRequestHandler into a Package (Plugin) > for Solr. This slims down the standard Solr download, and *might* make it > easier to update the version of Tika + dependent jars used? > >>> >> > >>> >> > >>> >> > >>> >> 2) The second approach is to instead require Tika-Server to be > running (https://issues.apache.org/jira/browse/SOLR-7633 < > https://issues.apache.org/jira/browse/SOLR-7633>< > https://issues.apache.org/jira/browse/SOLR-7633 < > https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr > delegate the call to Tika-Server. > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> I was thinking about why I like option 1 better than 2, and I think > it boils down to how mature the IT organization I am working with is. Some > IT organizations have large dev-ops teams, and are working at major scale, > and managing a fleet of Tika-Server on Kubernetes with Load Balancer > dynamically scaling up and down is simple and second nature! However, many > organizations aren’t like that. > >>> >> > >>> >> > >>> >> > >>> >> So I guess what I’m asking is do we have a reasonable supported > approach for deploying Tika Server for non-tika savvy organizations? I’m > thinking about Solr, and specifically the fact that Solr has a well defined > set of Service Installation scripts. When I follow the directions in > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > < > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > >< > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production > < > https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production>> > I can feel confident that when the server is rebooted, then Solr will come > back up! Plus there is log rotation and all the rest. > >>> >> > >>> >> > >>> >> > >>> >> In contrast, when I look at Tika website, specifically > https://tika.apache.org/1.22/gettingstarted.htm < > https://tika.apache.org/1.22/gettingstarted.htm>< > https://tika.apache.org/1.22/gettingstarted.htm < > https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message is > to run Tika as a command line application, or embedded in your > application. > >>> >> > >>> >> > >>> >> > >>> >> I’m wondering if Tika-Server needs to be made more prominent, and > treated as the “primary method of interacting with Tika”? Do we need as a > community to focus more on Tika-Server? In our getting started > documentation, in our usage documentation, and in our examples? > >>> >> > >>> >> > >>> >> > >>> >> Do we need to create the equivalent of the Service Installation > scripts for Tika-Server? > >>> >> > >>> >> > >>> >> > >>> >> Wanted to stoke the discussion! > >>> >> > >>> >> > >>> >> > >>> >> Eric > >>> >> > >>> >> > >>> >> > >>> >> _______________________ > >>> >> > >>> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | > 434.466.1467 | http://www.opensourceconnections.com < > http://www.opensourceconnections.com/>< > http://www.opensourceconnections.com/ < > http://www.opensourceconnections.com/>>< > http://www.opensourceconnections.com/ < > http://www.opensourceconnections.com/> < > http://www.opensourceconnections.com/ < > http://www.opensourceconnections.com/>>> | My Free/Busy < > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> < > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>> > >>> >> > >>> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>> > > >>> >> > >>> >> This e-mail and all contents, including attachments, is considered > to be Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. > >>> > > >>> > _______________________ > >>> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | > 434.466.1467 | http://www.opensourceconnections.com < > http://www.opensourceconnections.com/>< > http://www.opensourceconnections.com/ < > http://www.opensourceconnections.com/>> | My Free/Busy < > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> > >>> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> > > >>> > This e-mail and all contents, including attachments, is considered > to be Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. > >>> > > >>> > >>> _______________________ > >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > | http://www.opensourceconnections.com < > http://www.opensourceconnections.com/>< > http://www.opensourceconnections.com/ < > http://www.opensourceconnections.com/>> | My Free/Busy < > http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> > >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> > > >>> This e-mail and all contents, including attachments, is considered to > be Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >>> > >> > >> Spicule Limited is registered in England & Wales. Company Number: > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston > Road, Brighton, England, BN1 6AF. VAT No. 251478891. > >> > >> > >> > >> All engagements are subject to Spicule Terms and Conditions of > Business. This email and its contents are intended solely for the > individual to whom it is addressed and may contain information that is > confidential, privileged or otherwise protected from disclosure, > distributing or copying. Any views or opinions presented in this email are > solely those of the author and do not necessarily represent those of > Spicule Limited. The company accepts no liability for any damage caused by > any virus transmitted by this email. If you have received this message in > error, please notify us immediately by reply email before deleting it from > your system. Service of legal notice cannot be effected on Spicule Limited > by email. > >> > > > > _______________________ > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > > > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >