Following this thread, should we deprecate/remove the Tika Docker support that 
is in Tika-server project?  

The `mvn dockerfile:build` command now relies on a plugin that is no longer 
supported according to https://github.com/spotify/dockerfile-maven, and it 
seems like the Tika-docker project is really the right place for this!

I’m thinking that this might help reduce the footprint of things we need to 
support.








> On Jan 9, 2020, at 12:08 AM, Chris Mattmann <mattm...@apache.org> wrote:
> 
> +1
> 
> 
> 
> Note there is also a USC tika dockers repo where I put the data science stuff 
> too:
> 
> 
> 
> http://github.com/USCDataScience/tika-dockers
> 
> 
> 
> I’ll continue to push DL and ML Tika stuff there.
> 
> Cheers,
> 
> Chris
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Dave Meikle <dmei...@apache.org>
> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> Date: Wednesday, January 8, 2020 at 2:18 PM
> To: "<dev@tika.apache.org>" <dev@tika.apache.org>
> Subject: Re: [EXTERNAL] Do we have a community supported approach for 
> deploying Tika Server in production?
> 
> 
> 
> Hi Eric,
> 
> 
> 
> Will take a look. On a related note, I've created a new repos:
> 
> https://github.com/apache/tika-docker
> 
> 
> 
> Thinking based on looking at the PRs and Issues on LogicalSpark
> 
> docker-tikaserver, I'll create an updated docker file using what you've
> 
> added here and look to publish builds to docker hub from that.
> 
> 
> 
> What do you think?
> 
> 
> 
> Cheers,
> 
> Dave
> 
> 
> 
> 
> 
> 
> 
> On Wed, 8 Jan 2020 at 03:16, Eric Pugh <ep...@opensourceconnections.com>
> 
> wrote:
> 
> 
> 
> Hi all, I’ve gone ahead and added the -spawnChild property as a default
> 
> when running Tika Server as a service.   I’d love some eyes on the PR, and
> 
> if this looks good, get it committed.
> 
> 
> 
> Feedback welcome!
> 
> 
> 
> Eric
> 
> 
> 
> 
> 
> 
> 
>> On Dec 17, 2019, at 12:53 PM, Eric Pugh <ep...@opensourceconnections.com>
> 
> wrote:
> 
>> 
> 
>> Cool.
> 
>> 
> 
>> It’s the auto run that I really need, and the other part that I don’t
> 
> think I’ve tackled properly is the managing of logs…
> 
>> 
> 
>> I’m going to check with my project to see if they support Snap packages.
> 
>> 
> 
>> Eric
> 
>> 
> 
>> 
> 
>>> On Dec 16, 2019, at 5:10 PM, Tom Barber <t...@spicule.co.uk <mailto:
> 
> t...@spicule.co.uk>> wrote:
> 
>>> 
> 
>>> Just saw this fly by and FYI on Linux systems that support Snap
> 
> packages (Ubuntu/Debian/Arch/Fedora etc) you can `snap install tika-server`
> 
> doesn’t yet auto-run I don’t believe but you can just run `tika-server.run`
> 
> and adding an init script wouldn’t take 5 minutes.
> 
>>> 
> 
>>> Tom
> 
>>> 
> 
>>> On 16 December 2019 at 18:42:55, Eric Pugh (
> 
> ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>)
> 
> wrote:
> 
>>> 
> 
>>>> Hi folks!
> 
>>>> 
> 
>>>> I’ve got a mostly completed PR for having install scripts for Tika
> 
> Server, and I’m hoping a committer will take a look at the PR, and give
> 
> feedback (and ideally commit in time for 1.24!)
> 
>>>> 
> 
>>>> A couple of things:
> 
>>>> 
> 
>>>> 1) This was completely influenced by
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> 
> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> 
>> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script
> 
> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#service-installation-script>>,
> 
> in fact I started with the Solr scripts.
> 
>>>> 
> 
>>>> 2) I’ve deleted all the Solr specific aspects (I think), however there
> 
> may still be more to delete.
> 
>>>> 
> 
>>>> 3) This requires a change to how we release Tika, previously we ship
> 
> tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we
> 
> want to add the tika-server-bin.tgz and tika-server-bin.zip binary
> 
> distributions.
> 
>>>> 
> 
>>>> I’m happy to start writing accompanying “how to deploy Tika Server”
> 
> docs if this PR looks good! Or, please give input and I’ll make the updates.
> 
>>>> 
> 
>>>> Eric
> 
>>>> 
> 
>>>> 
> 
>>>>> On Dec 12, 2019, at 2:39 PM, Eric Pugh <
> 
> ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>>
> 
> wrote:
> 
>>>>> 
> 
>>>>> I’ve created this JIRA to track this work:
> 
> https://issues.apache.org/jira/browse/TIKA-3010 <
> 
> https://issues.apache.org/jira/browse/TIKA-3010> <
> 
> https://issues.apache.org/jira/browse/TIKA-3010 <
> 
> https://issues.apache.org/jira/browse/TIKA-3010>>
> 
>>>>> 
> 
>>>>> And a WIP progress PR is at https://github.com/apache/tika/pull/305
> 
> <https://github.com/apache/tika/pull/305> <
> 
> https://github.com/apache/tika/pull/305 <
> 
> https://github.com/apache/tika/pull/305>>
> 
>>>>> 
> 
>>>>> My thought is to put something together that mimics how we deploy
> 
> Solr, and see how that works. I have a need for an install process that a
> 
> general IT person can follow, who isn’t a Tika expert or a Docker users.
> 
>>>>> 
> 
>>>>> 
> 
>>>>> 
> 
>>>>> 
> 
>>>>>> On Dec 4, 2019, at 12:28 PM, Chris Mattmann <mattm...@apache.org
> 
> <mailto:mattm...@apache.org> <mailto:mattm...@apache.org <mailto:
> 
> mattm...@apache.org>>> wrote:
> 
>>>>>> 
> 
>>>>>> Thanks for bringing this conversation up Eric.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Historically if you look over the last 5 years, I think what you
> 
> are asking below has sort of already become the de facto
> 
>>>>>> truth. Most people are in fact using Tika server, whether they are
> 
> individual devs, govvies, commercial folk and the like.
> 
>>>>>> 
> 
>>>>>> Big, small and medium projects. Evidenced by the expansion of Tika
> 
> APIs into pretty much every PL I know and use of
> 
>>>>>> actively today.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Given that, we probably should update the main website docs to make
> 
> this more prominent. The tika server docs on the
> 
>>>>>> wiki are pretty darn good. But they don’t get prime real estate.
> 
> Would be wonderful if someone wants to update the
> 
>>>>>> website to make it more prominent.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> The downstream Tika Python lib that I maintain has tons of activity
> 
> is used by more than 350+ projects and relies solely
> 
>>>>>> on Tika-Server. My recommendation to the Solr folks (having created
> 
> 7633) from the 2014 DARPA MEMEX days was to
> 
>>>>>> move towards Tika Server based SolrCell dep and that’s the right
> 
> way to go IMO.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Chris
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> From: Eric Pugh <ep...@opensourceconnections.com <mailto:
> 
> ep...@opensourceconnections.com> <mailto:ep...@opensourceconnections.com
> 
> <mailto:ep...@opensourceconnections.com>>>
> 
>>>>>> Reply-To: "dev@tika.apache.org <mailto:dev@tika.apache.org>
> 
> <mailto:dev@tika.apache.org <mailto:dev@tika.apache.org>>" <
> 
> dev@tika.apache.org <mailto:dev@tika.apache.org> <mailto:
> 
> dev@tika.apache.org <mailto:dev@tika.apache.org>>>
> 
>>>>>> Date: Wednesday, December 4, 2019 at 12:24 PM
> 
>>>>>> To: "tika-...@apache.org <mailto:tika-...@apache.org> <mailto:
> 
> tika-...@apache.org <mailto:tika-...@apache.org>>" <tika-...@apache.org
> 
> <mailto:tika-...@apache.org> <mailto:tika-...@apache.org <mailto:
> 
> tika-...@apache.org>>>
> 
>>>>>> Subject: [EXTERNAL] Do we have a community supported approach for
> 
> deploying Tika Server in production?
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Hi all - Hoping this is a reasonable Tika-dev versus Tika-user
> 
> question!
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Over in Solr land there has been renewed discussion about
> 
> streamlining what Solr is....
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> In regards to rich content extraction and the Tika project, it
> 
> seems like the two ideas that continue to preserve the existing behavior
> 
> are:
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 1) To convert the ExtractingRequestHandler into a Package (Plugin)
> 
> for Solr. This slims down the standard Solr download, and *might* make it
> 
> easier to update the version of Tika + dependent jars used?
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 2) The second approach is to instead require Tika-Server to be
> 
> running (https://issues.apache.org/jira/browse/SOLR-7633 <
> 
> https://issues.apache.org/jira/browse/SOLR-7633><
> 
> https://issues.apache.org/jira/browse/SOLR-7633 <
> 
> https://issues.apache.org/jira/browse/SOLR-7633>>) and just have Solr
> 
> delegate the call to Tika-Server.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> I was thinking about why I like option 1 better than 2, and I think
> 
> it boils down to how mature the IT organization I am working with is. Some
> 
> IT organizations have large dev-ops teams, and are working at major scale,
> 
> and managing a fleet of Tika-Server on Kubernetes with Load Balancer
> 
> dynamically scaling up and down is simple and second nature! However, many
> 
> organizations aren’t like that.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> So I guess what I’m asking is do we have a reasonable supported
> 
> approach for deploying Tika Server for non-tika savvy organizations? I’m
> 
> thinking about Solr, and specifically the fact that Solr has a well defined
> 
> set of Service Installation scripts. When I follow the directions in
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> 
> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> 
>> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production
> 
> < 
> 
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#taking-solr-to-production>>
> 
> I can feel confident that when the server is rebooted, then Solr will come
> 
> back up! Plus there is log rotation and all the rest.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> In contrast, when I look at Tika website, specifically
> 
> https://tika.apache.org/1.22/gettingstarted.htm <
> 
> https://tika.apache.org/1.22/gettingstarted.htm><
> 
> https://tika.apache.org/1.22/gettingstarted.htm <
> 
> https://tika.apache.org/1.22/gettingstarted.htm>> pagel, the message is
> 
> to run Tika as a command line application, or embedded in your
> 
> application.
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> I’m wondering if Tika-Server needs to be made more prominent, and
> 
> treated as the “primary method of interacting with Tika”? Do we need as a
> 
> community to focus more on Tika-Server? In our getting started
> 
> documentation, in our usage documentation, and in our examples?
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Do we need to create the equivalent of the Service Installation
> 
> scripts for Tika-Server?
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Wanted to stoke the discussion!
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> Eric
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>>> _______________________
> 
>>>>>> 
> 
>>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> 
> 434.466.1467 | http://www.opensourceconnections.com <
> 
> http://www.opensourceconnections.com/><
> 
> http://www.opensourceconnections.com/ <
> 
> http://www.opensourceconnections.com/>><
> 
> http://www.opensourceconnections.com/ <
> 
> http://www.opensourceconnections.com/> <
> 
> http://www.opensourceconnections.com/ <
> 
> http://www.opensourceconnections.com/>>> | My Free/Busy <
> 
> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal> <
> 
> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>>
> 
>>>>>> 
> 
>>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> 
> < 
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> 
> < 
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> 
> < 
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>>
> 
> 
> 
>>>>>> 
> 
>>>>>> This e-mail and all contents, including attachments, is considered
> 
> to be Company Confidential unless explicitly stated otherwise, regardless
> 
> of whether attachments are marked as such.
> 
>>>>> 
> 
>>>>> _______________________
> 
>>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> 
> 434.466.1467 | http://www.opensourceconnections.com <
> 
> http://www.opensourceconnections.com/><
> 
> http://www.opensourceconnections.com/ <
> 
> http://www.opensourceconnections.com/>> | My Free/Busy <
> 
> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> 
>>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> 
> < 
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>
> 
> 
> 
>>>>> This e-mail and all contents, including attachments, is considered
> 
> to be Company Confidential unless explicitly stated otherwise, regardless
> 
> of whether attachments are marked as such.
> 
>>>>> 
> 
>>>> 
> 
>>>> _______________________
> 
>>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> 
> | http://www.opensourceconnections.com <
> 
> http://www.opensourceconnections.com/><
> 
> http://www.opensourceconnections.com/ <
> 
> http://www.opensourceconnections.com/>> | My Free/Busy <
> 
> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> 
>>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> 
> < 
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>
> 
> 
> 
>>>> This e-mail and all contents, including attachments, is considered to
> 
> be Company Confidential unless explicitly stated otherwise, regardless of
> 
> whether attachments are marked as such.
> 
>>>> 
> 
>>> 
> 
>>> Spicule Limited is registered in England & Wales. Company Number:
> 
> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
> 
> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> All engagements are subject to Spicule Terms and Conditions of
> 
> Business. This email and its contents are intended solely for the
> 
> individual to whom it is addressed and may contain information that is
> 
> confidential, privileged or otherwise protected from disclosure,
> 
> distributing or copying. Any views or opinions presented in this email are
> 
> solely those of the author and do not necessarily represent those of
> 
> Spicule Limited. The company accepts no liability for any damage caused by
> 
> any virus transmitted by this email. If you have received this message in
> 
> error, please notify us immediately by reply email before deleting it from
> 
> your system. Service of legal notice cannot be effected on Spicule Limited
> 
> by email.
> 
>>> 
> 
>> 
> 
>> _______________________
> 
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> 
> http://www.opensourceconnections.com <
> 
> http://www.opensourceconnections.com/> | My Free/Busy <
> 
> http://tinyurl.com/eric-cal>
> 
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> 
> 
> 
>> This e-mail and all contents, including attachments, is considered to be
> 
> Company Confidential unless explicitly stated otherwise, regardless of
> 
> whether attachments are marked as such.
> 
>> 
> 
> 
> 
> _______________________
> 
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> 
> http://www.opensourceconnections.com <
> 
> http://www.opensourceconnections.com/> | My Free/Busy <
> 
> http://tinyurl.com/eric-cal>
> 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> 
> 
> 
> This e-mail and all contents, including attachments, is considered to be
> 
> Company Confidential unless explicitly stated otherwise, regardless of
> 
> whether attachments are marked as such.
> 
> 
> 
> 
> 
> 
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to