[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190681#comment-17190681
 ] 

Markus Kalkbrenner edited comment on SOLR-13973 at 9/4/20, 11:18 AM:
---------------------------------------------------------------------

{quote}Perhaps even a simple Tika integration in SolrJ would make sense, making 
it super simple to do the extraction on client side, which is probably what 
most users should consider anyway.
{quote}
As maintainer of Solarium, the major PHP Client for Solr, and of the 
Solr-Drupal-Integration I know that there're users and Solr Service Providers 
who rely on the ExtractionHandler and the out-of-the-box experience as 
[~AndrewGr] described. Even if I understand your motivation as a developer, 
moving the workflow to the client side  will put a significant work load on 
other developers, even if you add Tika support to SolrJ.
 Maybe the amount of people who use Solr in combination with a different 
programming language is higher compared to the amount of Java projects which 
use SolrJ.

There're more than 58,000 active Drupal installations using Solr as search 
backend today:
 [https://www.drupal.org/project/usage/search_api_solr]
https://www.drupal.org/project/usage/apachesolr

github lists 895 repositories that directly depend on the PHP solarium library:
 [https://github.com/solariumphp/solarium/network/dependents]

These includes packages from other PHP frameworks like symfony, laravel, typo3, 
wordpress, ...

Nearly 200,000 composer based build processes of PHP projects pulled the 
solarium library within the last 30 days:
 [https://packagist.org/packages/solarium/solarium/stats#major/all]

For sure, just a few of all these installations will use Tika indirectly via 
the extraction handler. But it won't be an easy task to add a stand alone Tika 
server to their stack. I know a lot of hosters who don't provide it yet to 
their customers.

I won't say that you shouldn't deprecate the embedded Tika at all. But take 
careful steps and be aware of the fact that the community of Solr users might 
be much greater as you think due to the out-of-the-box solutions that exist, 
especially in the PHP world.

BTW SOLR-14768 has been detected automatically by the automated integration 
tests of the solarium library and also  by the automated integration tests of 
the Search API Solr Drupal module!

 


was (Author: mkalkbrenner):
{quote}Perhaps even a simple Tika integration in SolrJ would make sense, making 
it super simple to do the extraction on client side, which is probably what 
most users should consider anyway.
{quote}
As maintainer of Solarium, the major PHP Client for Solr, and of the 
Solr-Drupal-Integration I know that there're users and Solr Service Providers 
who rely on the ExtractionHandler and the out-of-the-box experience as 
[~AndrewGr] described. Even if I understand your motivation as a developer, 
moving the workflow to the client side  will put a significant work load on 
other developers, even if you add Tika support to SolrJ.
Maybe the amount of people who use Solr in combination with a different 
programming language is higher compared to the amount of Java projects which 
use SolrJ.

There're more than 40,000 active Drupal installations using Solr as search 
backend today:
[https://www.drupal.org/project/usage/search_api_solr]

github lists 895 repositories that directly depend on the PHP solarium library:
[https://github.com/solariumphp/solarium/network/dependents]

These includes packages from other PHP frameworks like symfony, laravel, typo3, 
wordpress, ...

Nearly 200,000 composer based build processes of PHP projects pulled the 
solarium library within the last 30 days:
[https://packagist.org/packages/solarium/solarium/stats#major/all]

For sure, just a few of all these installations will use Tika indirectly via 
the extraction handler. But it won't be an easy task to add a stand alone Tika 
server to their stack. I know a lot of hosters who don't provide it yet to 
their customers.

I won't say that you shouldn't deprecate the embedded Tika at all. But take 
careful steps and be aware of the fact that the community of Solr users might 
be much greater as you think due to the out-of-the-box solutions that exist, 
especially in the PHP world.

BTW SOLR-14768 has been detected automatically by the automated integration 
tests of the solarium library and also  by the automated integration tests of 
the Search API Solr Drupal module!

 

> Deprecate Tika
> --------------
>
>                 Key: SOLR-13973
>                 URL: https://issues.apache.org/jira/browse/SOLR-13973
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: 8.7
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to