[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158722#comment-17158722 ]
Jan Høydahl commented on SOLR-13973: ------------------------------------ +1 to "modern way", i.e. Tika Server. But there are many ways Tika could be integrated - as an ExtractingRequestHandler, as an UpdateRequestProcessor, as a standalone server etc. So there could be more than one package here to suit different needs? One could be a 'solr-cell-package' that walks like SolrCell and quacks like SolrCell, but delegates the extraction to a TikaServer. Another could be an 'attachment-processor', that reads a base64 encoded field in a SolrInputDocument, sends it to TikaServer for extraction, and writes the text to another field. The input field could alternatively be a file system path, an S3 location or another URI. So I think a quick survey of user needs makes sense. Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extration on client side, which is probably what most users should consider anyway. > Deprecate Tika > -------------- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement > Reporter: Ishan Chattopadhyaya > Priority: Blocker > Fix For: 8.7 > > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org