[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190681#comment-17190681 ]
Markus Kalkbrenner edited comment on SOLR-13973 at 9/4/20, 11:18 AM: --------------------------------------------------------------------- {quote}Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extraction on client side, which is probably what most users should consider anyway. {quote} As maintainer of Solarium, the major PHP Client for Solr, and of the Solr-Drupal-Integration I know that there're users and Solr Service Providers who rely on the ExtractionHandler and the out-of-the-box experience as [~AndrewGr] described. Even if I understand your motivation as a developer, moving the workflow to the client side will put a significant work load on other developers, even if you add Tika support to SolrJ. Maybe the amount of people who use Solr in combination with a different programming language is higher compared to the amount of Java projects which use SolrJ. There're more than 58,000 active Drupal installations using Solr as search backend today: [https://www.drupal.org/project/usage/search_api_solr] https://www.drupal.org/project/usage/apachesolr github lists 895 repositories that directly depend on the PHP solarium library: [https://github.com/solariumphp/solarium/network/dependents] These includes packages from other PHP frameworks like symfony, laravel, typo3, wordpress, ... Nearly 200,000 composer based build processes of PHP projects pulled the solarium library within the last 30 days: [https://packagist.org/packages/solarium/solarium/stats#major/all] For sure, just a few of all these installations will use Tika indirectly via the extraction handler. But it won't be an easy task to add a stand alone Tika server to their stack. I know a lot of hosters who don't provide it yet to their customers. I won't say that you shouldn't deprecate the embedded Tika at all. But take careful steps and be aware of the fact that the community of Solr users might be much greater as you think due to the out-of-the-box solutions that exist, especially in the PHP world. BTW SOLR-14768 has been detected automatically by the automated integration tests of the solarium library and also by the automated integration tests of the Search API Solr Drupal module! was (Author: mkalkbrenner): {quote}Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extraction on client side, which is probably what most users should consider anyway. {quote} As maintainer of Solarium, the major PHP Client for Solr, and of the Solr-Drupal-Integration I know that there're users and Solr Service Providers who rely on the ExtractionHandler and the out-of-the-box experience as [~AndrewGr] described. Even if I understand your motivation as a developer, moving the workflow to the client side will put a significant work load on other developers, even if you add Tika support to SolrJ. Maybe the amount of people who use Solr in combination with a different programming language is higher compared to the amount of Java projects which use SolrJ. There're more than 40,000 active Drupal installations using Solr as search backend today: [https://www.drupal.org/project/usage/search_api_solr] github lists 895 repositories that directly depend on the PHP solarium library: [https://github.com/solariumphp/solarium/network/dependents] These includes packages from other PHP frameworks like symfony, laravel, typo3, wordpress, ... Nearly 200,000 composer based build processes of PHP projects pulled the solarium library within the last 30 days: [https://packagist.org/packages/solarium/solarium/stats#major/all] For sure, just a few of all these installations will use Tika indirectly via the extraction handler. But it won't be an easy task to add a stand alone Tika server to their stack. I know a lot of hosters who don't provide it yet to their customers. I won't say that you shouldn't deprecate the embedded Tika at all. But take careful steps and be aware of the fact that the community of Solr users might be much greater as you think due to the out-of-the-box solutions that exist, especially in the PHP world. BTW SOLR-14768 has been detected automatically by the automated integration tests of the solarium library and also by the automated integration tests of the Search API Solr Drupal module! > Deprecate Tika > -------------- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement > Reporter: Ishan Chattopadhyaya > Assignee: Ishan Chattopadhyaya > Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org