[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158722#comment-17158722
 ] 

Jan Høydahl commented on SOLR-13973:
------------------------------------

+1 to "modern way", i.e. Tika Server. But there are many ways Tika could be 
integrated - as an ExtractingRequestHandler, as an UpdateRequestProcessor, as a 
standalone server etc.

So there could be more than one package here to suit different needs? One could 
be a 'solr-cell-package' that walks like SolrCell and quacks like SolrCell, but 
delegates the extraction to a TikaServer. Another could be an 
'attachment-processor', that reads a base64 encoded field in a 
SolrInputDocument, sends it to TikaServer for extraction, and writes the text 
to another field. The input field could alternatively be a file system path, an 
S3 location or another URI.

So I think a quick survey of user needs makes sense. Perhaps even a simple Tika 
integration in SolrJ would make sense, making it super simple to do the 
extration on client side, which is probably what most users should consider 
anyway. 

> Deprecate Tika
> --------------
>
>                 Key: SOLR-13973
>                 URL: https://issues.apache.org/jira/browse/SOLR-13973
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: 8.7
>
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to